[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Question] Is this a bug?
From: |
Neil R. Ormos |
Subject: |
Re: [Question] Is this a bug? |
Date: |
Sat, 8 Jul 2023 12:52:55 -0500 (CDT) |
User-agent: |
Alpine 2.20 (DEB 67 2015-01-07) |
Wolfgang Laun wrote:
> Sedapnya Tidur wrote:
>> $ gawk 'BEGIN { print match("a[", /^[^[]\x5B/) }'
>> gawk: cmd. line:1: error: Invalid regular expression: /^[^[]\/
>> $ gawk -V
>> GNU Awk 5.2.2, API 3.2, (GNU MPFR 4.2.0-p9, GNU MP 6.2.1)
>> $ grep -Po --color '^[^[]\x5B' <<< 'a[xxx'
>> a[
> grep with -P mimics Perl down to the least
> detail, i.e., the way Perl parses any input
> text. Thus, '\x5B' is not the same as '[' but is
> treated as '\[", an escaped bracket. Deep in the
> Perl 5 documentation on backslash in regular
> expressions you can find this paragraph: *Note
> that a character expressed as one of these*
> [hexadhecimal] *escapes is considered a
> character without special meaning by the regex
> engine, and will match "as is". *(There is a
> similar paragraph on octal escapes.)
> (g)awk processes string literals and literal
> regular expressions as most compilers do,
> converting hexadecimal escapes to
> characters. Therefore, "\x5B" becomes "[" and is
> indistinguishable from a "[" in the input.
Separate from the Perl regexp issue, the original poster's report exposes a
change in gawk's error message behavior that might be a bug.
Gawk version 5.1.0 prints an "Invalid regular expression" error message that
shows the "[" as the last character of the invalid regular expression.
That is consistent with Wolfgang's explanation that "'\x5B' becomes '[' and is
indistinguishable from a '[' in the input."
By version 5.1.1, the error message changed to replace the final "[" with "\".
The same problem affects the "Unmatched [..." error message that prints when
additional characters follow the \x5b without closing the bracket expression.
In all these cases, the regexp being printed in the error messages is
truncated. Perhaps the error message is being prepared by copying the input
text from before escape processing, but using the character count determined
after escape processing.
###############
./gawk --version | head -1
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 3.1.5, GNU MP 6.1.2)
./gawk 'BEGIN { print match("a[", /^[^[]\x5b/) }'
gawk: cmd. line:1: error: Invalid regular expression: /^[^[][/
./gawk 'BEGIN { print match("a[", /^[^[]\x5ba/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[][a/
./gawk 'BEGIN { print match("a[", /^[^[]\x5baa/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[][aa/
./gawk 'BEGIN { print match("a[", /^[^[]\x5baaa/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[][aaa/
###############
./gawk --version | head -1
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)
./gawk 'BEGIN { print match("a[", /^[^[]\x5b/) }'
gawk: cmd. line:1: error: Invalid regular expression: /^[^[]\/
./gawk 'BEGIN { print match("a[", /^[^[]\x5ba/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[]\x/
./gawk 'BEGIN { print match("a[", /^[^[]\x5baa/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[]\x5/
./gawk 'BEGIN { print match("a[", /^[^[]\x5baaa/) }'
gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[]\x5b/
###############