[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Question on the behavior of length()
From: |
Christian Schmidt |
Subject: |
Question on the behavior of length() |
Date: |
Sun, 10 Dec 2023 15:18:30 +0100 |
User-agent: |
Mozilla Thunderbird |
Hi all,
First of all I'd like to state that I do not consider the following
necessarily a bug; however I'd like to discuss the current
implementation, and have not found a better place to do so.
My issue is that I can't use
x[length(x)] = y
e.g. to emulate to push to a stack, without explicitly converting x into
an array first, e.g. by "delete x".
x will come into life untyped, and should be able to be used as an
array, however calling length() on it converts it to an unassigned scalar.
There's two angles to this:
a) posix, which defines length() as
length[([s])]
Return the length, in characters, of its argument taken as a string, or
of the whole record, $0, if there is no argument.
b) GNU awk, which extends length() to be used to get the number of
elements in an array.
I wonder if it would be more reasonable to just return 0 on length()
called with an untyped variable as argument, without modifying its input
(by converting it into a scalar).
My rationale for this is:
1. Converting an untyped to a string will always create a zero-length
string, and so the result will be correct.
2. Converting an untyped to an array will create an empty, thus
zero-length, array (relevant for the GNU extension for length())
3. POSIX does not specify conversion of the argument, only "argument
taken as a string"
4. Using the (still untyped) variable later will have no change of
behavior for any scalar types, as the scalar type itself is not fixed,
except you now can still use it as an array
5. Generally, functions should IMHO not leave changes to their arguments
after leaving scope.
The fact this happens can be observed as
BEGIN {
print typeof(x)
print length(x)
print typeof(x)
exit
}
outputs
untyped
0
unassigned
After reading the source I am actually not sure how/why this happens. My
guess is builtin.c line 609ff.:
if (tmp->type == Node_var_array) {
[...]
} else if (tmp->type == Node_var_new || tmp->type ==
Node_elem_new) {
// this can happen from an indirect call
DEREF(tmp);
tmp = dupnode(Nnull_string);
}
However I am not sure why this leaves changes outside scope.
The assumption "this can happen from an indirect call" does not hold up,
it can definitely happen on purpose ;). I also am not familiar enough
with the codebase to understand what an indirect call in this context
is, and as such wary of just changing the else {} to return 0, as
internal callers (that I can't see, at least not calling do_length()
directly) might rely on the existing behavior.
Any feedback?
Best regards,
Chris
PS: please CC: me on replies, I am not subscribed to the list.
- Question on the behavior of length(),
Christian Schmidt <=
- Re: Question on the behavior of length(), Ed Morton, 2023/12/11
- Re: Question on the behavior of length(), Wolfgang Laun, 2023/12/11
- Re: Question on the behavior of length(), Ed Morton, 2023/12/11
- Re: Question on the behavior of length(), Wolfgang Laun, 2023/12/11
- Re: Question on the behavior of length(), Ed Morton, 2023/12/11
- Re: Question on the behavior of length(), Andrew J. Schorr, 2023/12/11
- Re: Question on the behavior of length(), arnold, 2023/12/11
Re: Question on the behavior of length(), arnold, 2023/12/11