bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: unassigned/untyped behaviour


From: Andrew J. Schorr
Subject: Re: unassigned/untyped behaviour
Date: Fri, 17 Nov 2023 11:50:04 -0500
User-agent: Mutt/1.5.21 (2010-09-15)

I'd guess that the force_string call in builtin.c:format_tree() is having
this "stringification" effect. Some quick debugging seems to indicate
that the type changes from Node_elem_new to Node_val and that the flags
change from "MALLOC | STRING | STRCUR | NUMCUR | NUMBER"
to "MALLOC | STRING | STRCUR | NUMCUR".
It looks like that's happening in awk.h:force_string_fmt.
Oddly enough, simply commenting out that code like so
seems to leave the value as untyped, and make check still passes.
This patch doesn't seem right...

diff --git a/awk.h b/awk.h
index cbc0a7e..7e2c9a0 100644
--- a/awk.h
+++ b/awk.h
@@ -1965,9 +1965,10 @@ static inline NODE *
 force_string_fmt(NODE *s, const char *fmtstr, int fmtidx)
 {
        if (s->type == Node_elem_new) {
+#if 0
                s->type = Node_val;
                s->flags &= ~NUMBER;
-
+#endif
                return s;
        }
 

But frankly, it seems to me that it should be "unassigned", not untyped.
For example, this code seems bogus, but works OK:

bash-5.1$ cat /tmp/bug.awk
BEGIN {
   printf "str test %s\n", a[1]
   print typeof(a[1])
   printf "int test %d\n", a[2]
   print typeof(a[2])
   a[2][3] = 4
   print a[2][3]
}

With gawk 5.3:

bash-5.1$ ./gawk -f /tmp/bug.awk 
str test 
string
int test 0
untyped
4

With patched master branch, it gives:

str test 
untyped
int test 0
untyped
4

But I think those 2 "untyped" results should actually be "unassigned",
because those array elements have been used as scalar values.
The proper fix is not clear to me...

Regards,
Andy

On Fri, Nov 17, 2023 at 10:10:59AM +0100, M wrote:
> Il giorno gio 16 nov 2023 alle ore 23:46 Andrew J. Schorr <
> aschorr@telemetry-investments.com> ha scritto:
> 
> 
>     But in fact, that's not what the master branch produces:
> 
>     bash-4.2$ ./gawk 'BEGIN {a[1]; print typeof(a[1])}'
>     untyped
> 
>     I'd guess that gawk is right and the documentation is wrong in that case.
> 
> 
> 
> Hi, yes it's what i mean. As you say can be a doc bug, since in same section,
> the manual said:
> 
> 
>     Note that prior to version 5.2, array elements that come into existence
>     simply by referencing them were different, they were automatically forced
>     to be scalars:
> 
> 
> 
>     I'm not clear on whether accessing a[1] in a scalar context should leave 
> it
>     untyped; perhaps it should be unassigned at that point, based on the
>     section
>     9.1.8 descriptions?
> 
> 
>  And yes, i also have this doubt. I expected it's be "unassigned" after the
> first non-typeof() or non-isarray() function call, for example.... but I
> imagine can be perfectly fine if the array element retains the "untyped" type
> in such a case, despite the behaviour differs from the one of non-array scalar
> identifiers, but the one that surprised me is the printf %s behaviour and the
> conseguent  "stringification".
> 
> 
> thanks,
> M.
> 
> Il giorno gio 16 nov 2023 alle ore 23:46 Andrew J. Schorr <
> aschorr@telemetry-investments.com> ha scritto:
> 
>     Hi,
> 
>     And it's a bit different for a regular variable as opposed to an
>     array element. Using the master branch:
> 
>     ./gawk 'BEGIN {print typeof(a); printf "test %d\n", a; print typeof(a);
>     printf "test2 %s\n", a; print typeof(a)}'
>     untyped
>     test 0
>     unassigned
>     test2
>     unassigned
> 
>     I think that looks right to me, based on the descriptions in 9.1.8.
> 
>     But the behavior for array elements does not seem consistent with the
>     manual.
>     Section 9.1.8 gives this example:
> 
>          ‘"unassigned"’
>               X is a scalar variable that has not been assigned a value yet.
>               For example:
> 
>                    BEGIN {
>                        # creates a[1] but it has no assigned value
>                        a[1]
>                        print typeof(a[1])  # unassigned
>                    }
> 
>     But in fact, that's not what the master branch produces:
> 
>     bash-4.2$ ./gawk 'BEGIN {a[1]; print typeof(a[1])}'
>     untyped
> 
>     I'd guess that gawk is right and the documentation is wrong in that case.
> 
>     The coercion to string type by saying 'printf "%s", a[1]' seems like it
>     needs
>     some investigation...
> 
>     bash-4.2$ ./gawk 'BEGIN {print typeof(a[1]); printf "test %s\n", a[1];
>     print typeof(a[1])}'
>     untyped
>     test
>     string
> 
>     I'm not clear on whether accessing a[1] in a scalar context should leave 
> it
>     untyped; perhaps it should be unassigned at that point, based on the
>     section
>     9.1.8 descriptions? I imagine the internal implementation details are
>     rather
>     complicated...
> 
>     Regards,
>     Andy
> 
>     On Thu, Nov 16, 2023 at 05:29:32PM -0500, Andrew J. Schorr wrote:
>     > Hi,
>     >
>     > Thanks for the bug report, but gosh, this is a super big test program.
>     Does
>     > this shorter program capture the essence of your issue? In general, it
>     makes
>     > life much easier for the developers to provide shorter examples of
>     problems.
>     >
>     > gawk 5.1.1:
>     > bash-5.1$ ./gawk 'BEGIN {a[1]; print typeof(a[1]); printf "test %d\n", a
>     [1]; print typeof(a[1]); printf "test2 %s\n", a[1]; print typeof(a[1])}'
>     > unassigned
>     > test 0
>     > unassigned
>     > test2
>     > unassigned
>     >
>     > gawk 5.3.0 (same as gawk 5.2.0):
>     > bash-5.1$ ./gawk 'BEGIN {a[1]; print typeof(a[1]); printf "test %d\n", a
>     [1]; print typeof(a[1]); printf "test2 %s\n", a[1]; print typeof(a[1])}'
>     > untyped
>     > test 0
>     > untyped
>     > test2
>     > string
>     >
>     > Regards,
>     > Andy
>     >
>     > On Thu, Nov 16, 2023 at 09:00:56PM +0100, M wrote:
>     > > From: crap0101
>     > > To: bug-gawk@gnu.org
>     > > Subject: unassigned/untyped behaviour
>     > >
>     > > Configuration Information [Automatically generated, do not change]:
>     > > Machine: x86_64
>     > > OS: linux-gnu
>     > > Compiler: gcc
>     > > Compilation CFLAGS: -g -O2 -DNDEBUG
>     > > uname output: Linux orange 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4
>     > > 18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
>     > > Machine Type: x86_64-pc-linux-gnu
>     > >
>     > > Gawk Version: 5.3.0
>     > >
>     > > Attestation 1:
>     > > I have read https://www.gnu.org/software/gawk/manual/html_node/
>     Bugs.html.
>     > > Yes
>     > >
>     > > Attestation 2:
>     > > I have not modified the sources before building gawk.
>     > > True
>     > >
>     > > Description:
>     > > Hi! here again :)
>     > > I found a "strange" behaviour testing a lib/extension i'm writing.
>     > > It's about managing unassigned/untyped array's values. At first it was
>     a
>     > > bug in the code i wrote, but i noticed this behaviour only in the last
>     > > release (5.3.0, pulled from git just today). As per the manual section
>     > > (9.1.8 Getting Type Information) referencing never used identifiers
>     (simple
>     > > variable or array elements) come "bring" them to life, and this should
>     > > works for every (builtin or not) function but typeof() and isarray().
>     So,
>     > > printing an untyped array elements, we expect it to become
>     > > unassigned...but, for example printing with printf() (see attached
>     files)
>     > > with a different format string gives different results: using %s gives
>     the
>     > > array element a string type, using %d left it untouched, untyped. I
>     don't
>     > > know if it's a printf() releated issue (same behaviour using sprintf).
>     > >
>     > > Repeat-By:
>     > > printf on untyped array elements (see attached files).
>     > >
>     > > Fix:
>     > > I don't have yet.
>     > >
>     > > --
>     > > me -> http://crap0101.altervista.org/
>     >
>     >
>     > > [-- mutt_octet_view: file type is unsupported for autoview (use 'v' to
>     view this part) --]
>     >
>     > > --- /tmp/r_5.1      2023-11-16 19:47:30.321162662 +0100
>     > > +++ /tmp/r_5.3      2023-11-16 19:47:22.221153534 +0100
>     > > @@ -2,23 +2,23 @@
>     > >  <arr[0]> (type:number) (isarray:0)
>     > >  <arr[1]> (type:strnum) (isarray:0)
>     > >  <arr[2]> (type:string) (isarray:0)
>     > > -<arr[3]> (type:unassigned) (isarray:0)
>     > > +<arr[3]> (type:untyped) (isarray:0)
>     > >  <arr[4]> (type:regexp) (isarray:0)
>     > >  ¿val? (type:untyped) (isarray:0)
>     > >  * again...
>     > >  <arr[0]> (type:number) (isarray:0)
>     > >  <arr[1]> (type:strnum) (isarray:0)
>     > >  <arr[2]> (type:string) (isarray:0)
>     > > -<arr[3]> (type:unassigned) (isarray:0)
>     > > +<arr[3]> (type:untyped) (isarray:0)
>     > >  <arr[4]> (type:regexp) (isarray:0)
>     > >  ¿val? (type:untyped) (isarray:0)
>     > >  * print also `invented` value
>     > >  <arr[0]> (type:number) (isarray:0)
>     > >  <arr[1]> (type:strnum) (isarray:0)
>     > >  <arr[2]> (type:string) (isarray:0)
>     > > -<arr[3]> (type:unassigned) (isarray:0)
>     > > +<arr[3]> (type:untyped) (isarray:0)
>     > >  <arr[4]> (type:regexp) (isarray:0)
>     > > -<> (type:untyped) (isarray:0)
>     > > +<> (type:unassigned) (isarray:0)
>     > >  * print `a` values
>     > >  arr[0] = <2>)
>     > >  arr[1] = <1>)
>     > > @@ -30,16 +30,17 @@
>     > >  <arr[0]> (type:number) (isarray:0)
>     > >  <arr[1]> (type:strnum) (isarray:0)
>     > >  <arr[2]> (type:string) (isarray:0)
>     > > +<arr[3]> (type:string) (isarray:0)
>     > >  <arr[4]> (type:regexp) (isarray:0)
>     > >  <> (type:unassigned) (isarray:0)
>     > > 
>     > >  * b[0],b[1] (untyped|unassigned)
>     > > -b[0]: (type:unassigned) (isarray:0)
>     > > -b[1]: (type:unassigned) (isarray:0)
>     > > +b[0]: (type:untyped) (isarray:0)
>     > > +b[1]: (type:untyped) (isarray:0)
>     > >  * printf b[0] with format %s gives: <>
>     > >  * printf b[1] with format %d gives: <0>
>     > > -b[0]: (type:unassigned) (isarray:0)
>     > > -b[1]: (type:unassigned) (isarray:0)
>     > > +b[0]: (type:string) (isarray:0)
>     > > +b[1]: (type:untyped) (isarray:0)
>     > > 
>     > >  * invented2...
>     > >  invented2 (type:untyped) (isarray:0)
>     >
>     >
>     > --
>     > Andrew Schorr                      e-mail:
>     aschorr@telemetry-investments.com
>     > Telemetry Investments, L.L.C.      phone:  917-305-1748
>     > 152 W 36th St, #402                fax:    212-425-5550
>     > New York, NY 10018-8765
> 
>     --
>     Andrew Schorr                      e-mail:
>     aschorr@telemetry-investments.com
>     Telemetry Investments, L.L.C.      phone:  917-305-1748
>     152 W 36th St, #402                fax:    212-425-5550
>     New York, NY 10018-8765
> 
> 
> 
> --
> me -> http://crap0101.altervista.org/

-- 
Andrew Schorr                      e-mail: aschorr@telemetry-investments.com
Telemetry Investments, L.L.C.      phone:  917-305-1748
152 W 36th St, #402                fax:    212-425-5550
New York, NY 10018-8765



reply via email to

[Prev in Thread] Current Thread [Next in Thread]