bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: is my loop issue a data conversion bug?


From: Peter Lindgren
Subject: Re: is my loop issue a data conversion bug?
Date: Thu, 16 Jul 2020 04:42:56 +0000 (UTC)

I've been looking at various awk references, and I do see clear statements that 
comparisons are only done numerically if both sides are numeric, as in TAPL 
page 44: 

"In a comparison expression like
    x == y
if both operands have a numeric type, the comparison is numeric; otherwise, any 
numeric operand is coerced to string and the comparison is made on the string 
values."

Or, per TAPL page 45:

"Thus, to force a string comparison between two fields, coerce one field to 
string:
    $1 " " == $2

To force a numeric comparison, coerce BOTH fields to numeric:
    $1 + 0 == $2 + 0"

However I also see various more general statements about conversion being done 
"in context", as in in "Effective awk Programming", page 84:

"Strings are converted to numbers and numbers are converted to strings, if the 
context of the awk program demands it."

or in TAPL, page 35:

 "A variable has a value that is a string or a number or both. Since the type 
of a variable is not declared, awk infers the type from context. When 
necessary, awk will convert a string value into a numeric one, or vice versa." 

(Brief pause as correspondent hesitates before contending with people much more 
likely to have served on POSIX committees than himself...)

If a classic for loop isn't a context demanding numeric conversion, what is?

See "Effective awk Programming", page 113, where the for statement is first 
described:

    for (initialisation; condition; increment)
        body

The succeeding text hints that you might possibly do something other than 
numeric operations here, but acknowledges that you wouldn't typically do that. 
I (just now) made up the following for loop using strings:

    for (x="a"; length(x)<35; x = x "a") 
        body

But that's the first time in a lifetime of awk programming that I even imagined 
doing so.

In the overwhelming majority of cases, as in my demo program, where loop 
initialisations and increments are clearly numeric, why not coerce both sides 
of the comparison to numeric as well? Numbers and strings are supposed to be so 
mutable, and that seems like the behavior that most users would expect. 

So, why not do it that way? (Pesky rules and standards and consistency and 
tradition aside... ;-)  


I've set myself up for this, take your best shot...


On Monday, July 13, 2020, 03:09:17 PM CDT, Davide Brini <dave_br@gmx.com> 
wrote: 





On Mon, 13 Jul 2020 18:11:37 +0000 (UTC), Peter Lindgren
<ogswd-awk@yahoo.com> wrote:

> I hesitate to report this as a bug - maybe it's just some expected
> behavior I don't understand ("That's not a bug, its a feature!"). But
> here goes anyway...
>
>[ snip]

> 
> Run them both on the supplied test data file "lendemo.dat" and observe
> the differences in the outputs. There are comments in the programs
> highlighting the interesting bits.


I wouldn't go so far as saying that it's a feature, but it's not a bug,
just expected behavior.

A simpler reproducer is:

gawk 'BEGIN{for(i = 1; i <= "9"; i++) print i}'

As explained in the documentation
(https://www.gnu.org/software/gawk/manual/gawk.html#Variable-Typing), when
an integer (i in the above example, i and j in your code) and a string ("9"
in the above example, len in your code) are compared, the comparison is a
string comparison, so since all numbers up to 89 are lesser than "9" when
converted to string and compared using string comparison, that's what you
get as output.

If you're wondering why "len" is a string in your code, remember that it's
an array key, and array keys are always string by definition (see
https://www.gnu.org/software/gawk/manual/gawk.html#Numeric-Array-Subscripts).

HTH

-- 
D.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]