bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Question on the behavior of length()


From: Christian Schmidt
Subject: Question on the behavior of length()
Date: Sun, 10 Dec 2023 15:18:30 +0100
User-agent: Mozilla Thunderbird

Hi all,

First of all I'd like to state that I do not consider the following necessarily a bug; however I'd like to discuss the current implementation, and have not found a better place to do so.

My issue is that I can't use

x[length(x)] = y

e.g. to emulate to push to a stack, without explicitly converting x into an array first, e.g. by "delete x".

x will come into life untyped, and should be able to be used as an array, however calling length() on it converts it to an unassigned scalar.

There's two angles to this:
a) posix, which defines length() as
length[([s])]
Return the length, in characters, of its argument taken as a string, or of the whole record, $0, if there is no argument. b) GNU awk, which extends length() to be used to get the number of elements in an array.

I wonder if it would be more reasonable to just return 0 on length() called with an untyped variable as argument, without modifying its input (by converting it into a scalar).

My rationale for this is:
1. Converting an untyped to a string will always create a zero-length string, and so the result will be correct. 2. Converting an untyped to an array will create an empty, thus zero-length, array (relevant for the GNU extension for length()) 3. POSIX does not specify conversion of the argument, only "argument taken as a string" 4. Using the (still untyped) variable later will have no change of behavior for any scalar types, as the scalar type itself is not fixed, except you now can still use it as an array 5. Generally, functions should IMHO not leave changes to their arguments after leaving scope.

The fact this happens can be observed as

BEGIN {
        print typeof(x)
        print length(x)
        print typeof(x)
        exit
}

outputs

untyped
0
unassigned

After reading the source I am actually not sure how/why this happens. My guess is builtin.c line 609ff.:

        if (tmp->type == Node_var_array) {
[...]
} else if (tmp->type == Node_var_new || tmp->type == Node_elem_new) {
                // this can happen from an indirect call
                DEREF(tmp);
                tmp = dupnode(Nnull_string);
        }

However I am not sure why this leaves changes outside scope.

The assumption "this can happen from an indirect call" does not hold up, it can definitely happen on purpose ;). I also am not familiar enough with the codebase to understand what an indirect call in this context is, and as such wary of just changing the else {} to return 0, as internal callers (that I can't see, at least not calling do_length() directly) might rely on the existing behavior.

Any feedback?

Best regards,
Chris

PS: please CC: me on replies, I am not subscribed to the list.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]