bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Question] Is this a bug?


From: Neil R. Ormos
Subject: Re: [Question] Is this a bug?
Date: Sat, 8 Jul 2023 12:52:55 -0500 (CDT)
User-agent: Alpine 2.20 (DEB 67 2015-01-07)

Wolfgang Laun wrote:
> Sedapnya Tidur wrote:

>> $ gawk 'BEGIN { print match("a[", /^[^[]\x5B/) }'
>> gawk: cmd. line:1: error: Invalid regular expression: /^[^[]\/

>> $ gawk -V
>> GNU Awk 5.2.2, API 3.2, (GNU MPFR 4.2.0-p9, GNU MP 6.2.1)

>> $ grep -Po --color '^[^[]\x5B' <<< 'a[xxx'
>> a[

> grep with -P mimics Perl down to the least
> detail, i.e., the way Perl parses any input
> text. Thus, '\x5B' is not the same as '[' but is
> treated as '\[", an escaped bracket. Deep in the
> Perl 5 documentation on backslash in regular
> expressions you can find this paragraph: *Note
> that a character expressed as one of these*
> [hexadhecimal] *escapes is considered a
> character without special meaning by the regex
> engine, and will match "as is". *(There is a
> similar paragraph on octal escapes.)

> (g)awk processes string literals and literal
> regular expressions as most compilers do,
> converting hexadecimal escapes to
> characters. Therefore, "\x5B" becomes "[" and is
> indistinguishable from a "[" in the input.

Separate from the Perl regexp issue, the original poster's report exposes a 
change in gawk's error message behavior that might be a bug.

Gawk version 5.1.0 prints an "Invalid regular expression" error message that 
shows the "[" as the last character of the invalid regular expression.

That is consistent with Wolfgang's explanation that "'\x5B' becomes '[' and is 
indistinguishable from a '[' in the input."

By version 5.1.1, the error message changed to replace the final "[" with "\".

The same problem affects the "Unmatched [..." error message that prints when 
additional characters follow the \x5b without closing the bracket expression.

In all these cases, the regexp being printed in the error messages is 
truncated.  Perhaps the error message is being prepared by copying the input 
text from before escape processing, but using the character count determined 
after escape processing.

###############

./gawk --version | head -1
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 3.1.5, GNU MP 6.1.2)

./gawk 'BEGIN { print match("a[", /^[^[]\x5b/) }'
gawk: cmd. line:1: error: Invalid regular expression: /^[^[][/

./gawk 'BEGIN { print match("a[", /^[^[]\x5ba/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[][a/

./gawk 'BEGIN { print match("a[", /^[^[]\x5baa/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[][aa/

./gawk 'BEGIN { print match("a[", /^[^[]\x5baaa/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[][aaa/

###############

./gawk --version | head -1
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)

./gawk 'BEGIN { print match("a[", /^[^[]\x5b/) }'
gawk: cmd. line:1: error: Invalid regular expression: /^[^[]\/

./gawk 'BEGIN { print match("a[", /^[^[]\x5ba/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[]\x/

./gawk 'BEGIN { print match("a[", /^[^[]\x5baa/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[]\x5/

./gawk 'BEGIN { print match("a[", /^[^[]\x5baaa/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[]\x5b/

###############



reply via email to

[Prev in Thread] Current Thread [Next in Thread]