bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

behaviour in regex comparison


From: *
Subject: behaviour in regex comparison
Date: Wed, 15 Nov 2023 23:06:40 +0100

Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -DNDEBUG
uname output: Linux orange 5.15.0-46-generic #49-Ubuntu SMP Thu Aug 4
18:03:25 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Gawk Version: 5.3.0

Attestation 1:
I have read https://www.gnu.org/software/gawk/manual/html_node/Bugs.html.
Yes

Attestation 2:
I have not modified the sources before building gawk.
True

Description:
I noticed a change in behaviour compared to previous versions (default
system gawk is 5.1.1, compared to 5.3.0) when comparing regex. In previous
versions - if I understand correctly - regex where compared as string,
following the rules described in the manual for strings comparison. With
version 5.3.0 it seems variable which have regex type compares always equal
(using equality operators `==` or `<=` or `>=` ) _but_ always unequal when
using `<` or `>`.
I know using these operators with variables typed as regex may be
inappropriate and, indeed, I don't find any reference about that in the
manual... but, I noticed this change and I asked myself if it's been made
on purpose (which would make perfect sense, btw).
Repeat-By:
crap0101@orange:~/test$ cat awk_re.awk
BEGIN {
x=@/bar/
y[0]=@/bar/
y[1]=@/baz/
y[2]="bar"
y[3]="baz"
printf("* set: x=@/%s/\n", x)
for (i in y) {
           yfmt = typeof(y[i]) == "regexp" ? sprintf("@/%s/", y[i]) : ""y[i]
           printf("* set: y=%s\n", yfmt)
           printf("@/%s/ == %s --> %d\n", x, yfmt, x == y[i])
           printf("@/%s/ ~  %s --> %d\n", x, yfmt, x ~ y[i])
           printf("@/%s/ <= %s --> %d\n", x, yfmt, x <= y[i])
           printf("@/%s/ <  %s --> %d\n", x, yfmt, x < y[i])
}
}
crap0101@orange:~/test$ awk -f awk_re.awk
* set: x=@/bar/
* set: y=@/bar/
@/bar/ == @/bar/ --> 1
@/bar/ ~  @/bar/ --> 1
@/bar/ <= @/bar/ --> 1
@/bar/ <  @/bar/ --> 0
* set: y=@/baz/
@/bar/ == @/baz/ --> 0
@/bar/ ~  @/baz/ --> 0
@/bar/ <= @/baz/ --> 1
@/bar/ <  @/baz/ --> 1
* set: y=bar
@/bar/ == bar --> 1
@/bar/ ~  bar --> 1
@/bar/ <= bar --> 1
@/bar/ <  bar --> 0
* set: y=baz
@/bar/ == baz --> 0
@/bar/ ~  baz --> 0
@/bar/ <= baz --> 1
@/bar/ <  baz --> 1
crap0101@orange:~/test$ AWK/gawk/gawk -f awk_re.awk
* set: x=@/bar/
* set: y=@/bar/
@/bar/ == @/bar/ --> 1
@/bar/ ~  @/bar/ --> 1
@/bar/ <= @/bar/ --> 1
@/bar/ <  @/bar/ --> 0
* set: y=@/baz/
@/bar/ == @/baz/ --> 1
@/bar/ ~  @/baz/ --> 0
@/bar/ <= @/baz/ --> 1
@/bar/ <  @/baz/ --> 0
* set: y=bar
@/bar/ == bar --> 1
@/bar/ ~  bar --> 1
@/bar/ <= bar --> 1
@/bar/ <  bar --> 0
* set: y=baz
@/bar/ == baz --> 0
@/bar/ ~  baz --> 0
@/bar/ <= baz --> 1
@/bar/ <  baz --> 1
crap0101@orange:~/test$ awk -f awk_re.awk > /tmp/a1
crap0101@orange:~/test$ AWK/gawk/gawk -f awk_re.awk > /tmp/a2
crap0101@orange:~/test$ diff -Naur /tmp/a1 /tmp/a2
--- /tmp/a1 2023-11-15 22:33:06.863658041 +0100
+++ /tmp/a2 2023-11-15 22:33:14.399652253 +0100
@@ -5,10 +5,10 @@
@/bar/ <= @/bar/ --> 1
@/bar/ <  @/bar/ --> 0
* set: y=@/baz/
-@/bar/ == @/baz/ --> 0
+@/bar/ == @/baz/ --> 1
@/bar/ ~  @/baz/ --> 0
@/bar/ <= @/baz/ --> 1
-@/bar/ <  @/baz/ --> 1
+@/bar/ <  @/baz/ --> 0
* set: y=bar
@/bar/ == bar --> 1
@/bar/ ~  bar --> 1
crap0101@orange:~/test$ AWK/gawk/gawk --version | head -1
GNU Awk 5.3.0, API 4.0, PMA Avon 8-g1
crap0101@orange:~/test$ awk --version | head -1
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.1)


Fix:
As said, don't sure it's a bug...so don't sure needs a fix.
The thing I found a bit confusing it's the `<=` vs `<` behaviour, but i
don't know if there is an easy fix (nor if it's needed).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]