bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GAWK 3.1.1 bug, DJGPP port


From: Stepan Kasal
Subject: Re: GAWK 3.1.1 bug, DJGPP port
Date: Wed, 8 Jan 2003 10:27:50 +0100
User-agent: Mutt/1.2.5.1i

Hello,

On Tue, Jan 07, 2003 at 05:26:10PM -0500, Serge Bohdjalian wrote:
> Thanks for your response. In my ignorance, I didn't realize that "e" (or 
> "E") could be used for base-10 exponent notation.

It took me some time too.  I'm glad that I was not the only one. :-)
When you see 00E0 you can't help thinking about it as about hex number.

> I don't understand why "0e0" is treated differently from "0".

You are right, of course, they should be treated exactly the same way.

> I don't see why the last 2 expressions, as you state, should equal 1 
> (TRUE). From what I understand, if "0" is compared to a numeric constant, 
> "0" should be interpreted as a number.

This is not true.  Observe:

$ gawk 'BEGIN{a="12"; print (a<7)}'
1
$ gawk 'BEGIN{a="12"; print ("12"<7)}'
1

When *string* is compared to a number, string comparison is performed.

The situation is different when $0 comes from input file, or when $1
comes from field splitting, they are *strnum*'s, waiting for their
opportunity to behave as number.
So:
        $ echo 12 | ../gawk '{print ($1 < 7)}'
        0
or:
> $ echo 1e2 3 | awk '{ print ($1 &lt; $2) ? "true" : "false" }'
> -| false

> ...the result is "false" because both "$1" and "$2"
> are user input. They are numeric strings -- therefore both have
> the "strnum" attribute, dictating a numeric comparison.

The point is that when you wrote a string constant in a program, with
double quote around it, you gave the information that the thing is
string.  And once gawk knows that it's string, not strnum, this
information is propagated across assignments.
OTOH, when something comes from an input or field split, gawk cannot
know whether it's a number or a string.

For the official explanation, please read the node
http://www.gnu.org/manual/gawk/html_node/Typing-and-Comparison.html
(The web version is for gawk-3.1.1, the newest available.)

So in my example, "00" is a string and should stay a string no matter
what I assign it to.  This worked as long as I assigned it to other
variables or fields >=1.  Once I assigned it to $0 it became strnum,
even other instances having this value has became strnum, which was
the bug.

Now let's get back to my example program:

> At 2003.01.07 01:49 PM +0100, you wrote:
> >Hello,
> >Observe:
> >         BEGIN {
> >                 # first bug:
> >                 $0 = "0 0e0"
> >                 print $1 && 1, $2 && 2
> >
> >                 # second bug:
> >                 $0 = "00"
> >                 print $0 && 1
> >
> >                 # and even worse:
> >                 $0 = a = "00"
> >                 print a && 1
> >         }
> >gives
> >         0 1
> >         0
> >         0
> >while correct results are
> >         0 0
> >         1
> >         1
> 
> Note that with 3 different versions of GAWK for Windows I get ...

> C:\TEST>"C:\TEST\gawk311_Cygwin.exe" -f "TEST.AWK" "TEST.TXT"
> 0 1
> 0
> 0

OK, this is what most gawk-3.1.1 builds behave like, including the
Linux one which Aharon Robbins and I usually refer to.

The Cygwin environment is closest to UNIX, so the need for tweaking the
source of a package to get it compiled is smallest here.

> C:\TEST>"C:\TEST\gawk311_DJGPP.exe" -f "TEST.AWK" "TEST.TXT"
> 0 0
> 0
> 0

DJGPP doesn't contain the first bug, as I mentioned before.
The second bug is still here.

> C:\TEST>"C:\TEST\gawk310_SourceForge.exe" -f "TEST.AWK" "TEST.TXT"
> 0 1
> 1
> 1

Yet another bug on the scene.  The 3.1.0 doesn't recognize that "00" is
a number.  I reproduced it on my linux.  (This has been fixed in 3.1.1.)
If you changed "00" to "0" you'd get the usual (wrong) output, the
same as produced by 3.1.1-cygwin.

> When I run the following script...
> 
> BEGIN {
>     $0 = "0";
>     if ($0) print $0 " = TRUE"; else print $0 " = FALSE";
> 
>     $0 = "0e0";
>     if ($0) print $0 " = TRUE"; else print $0 " = FALSE";
> }
> 
> ...I get the following output with my 3 versions of GAWK for Windows...
> 
> C:\TEST>"C:\TEST\gawk310_SourceForge.exe" -f "TEST.AWK" "TEST.TXT"
> 0 = FALSE
> 0e0 = TRUE
> 
> C:\TEST>"C:\TEST\gawk311_Cygwin.exe" -f "TEST.AWK" "TEST.TXT"
> 0 = FALSE
> 0e0 = TRUE
> 
> C:\TEST>"C:\TEST\gawk311_DJGPP.exe" -f "TEST.AWK" "TEST.TXT"
> 0 = FALSE
> 0e0 = FALSE
> 
> ...
> 
> Before I knew that "0e0" could be interpreted as a number, I would have 
> thought that the first 2 outputs (from SourceForge and Cygwin) were 
> correct. Now I'm not so sure.

Now we are sure that both lines must have the same value, as 0 and 0e0
are equivalent.
I beleive that the right result is

0 = TRUE
0e0 = TRUE

which is what my current (patched) version of gawk produces.

You probably have thought that assigning $0 or $1 in a BEGIN rule is
a good way to simulate user input.  It is not.  It should behave
a completely different way.
That's why the gawk manual doesn't do this and uses the pattern.
        echo .... | gawk '{...}'

Unfortunately, you tripped over a bug here (in fact, double bug).
The two bugs hiding each other, in combination with the fact that you
was not aware that assigning a field is completely different from
getting it from input file have to confuse you.
I hope it's clearer now.

A final note: in "normal" awk program you don't enter into this.
The source could probably contain not only 00EF but also 0089 which
would be interpreted as (decimal) number.  So the usual way is to
ensure that the value is string, like this:

        a = $2 ""

ie. you ensure that `a' holds a string by performing a concatenation
with null string.  When you keep this rule (and don't abuse $0 of course),
you won't hit the bugs we are currently discussing.

You probably know this, I just wanted to make sure you don't conclude
that awk is imposible to use as it has weird rules for type conversion.

Have a lot of fun,
        Stepan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]