bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Question] Is this a bug?


From: arnold
Subject: Re: [Question] Is this a bug?
Date: Sun, 09 Jul 2023 01:49:27 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hi Neil.

Thanks for pointing this out. Your analysis is correct and
the fix is attached. I will push it to Git in the next
day or two.

Thanks,

Arnold

"Neil R. Ormos" <ormos-gnulists17@ormos.org> wrote:

> Wolfgang Laun wrote:
> > Sedapnya Tidur wrote:
>
> >> $ gawk 'BEGIN { print match("a[", /^[^[]\x5B/) }'
> >> gawk: cmd. line:1: error: Invalid regular expression: /^[^[]\/
>
> >> $ gawk -V
> >> GNU Awk 5.2.2, API 3.2, (GNU MPFR 4.2.0-p9, GNU MP 6.2.1)
>
> >> $ grep -Po --color '^[^[]\x5B' <<< 'a[xxx'
> >> a[
>
> > grep with -P mimics Perl down to the least
> > detail, i.e., the way Perl parses any input
> > text. Thus, '\x5B' is not the same as '[' but is
> > treated as '\[", an escaped bracket. Deep in the
> > Perl 5 documentation on backslash in regular
> > expressions you can find this paragraph: *Note
> > that a character expressed as one of these*
> > [hexadhecimal] *escapes is considered a
> > character without special meaning by the regex
> > engine, and will match "as is". *(There is a
> > similar paragraph on octal escapes.)
>
> > (g)awk processes string literals and literal
> > regular expressions as most compilers do,
> > converting hexadecimal escapes to
> > characters. Therefore, "\x5B" becomes "[" and is
> > indistinguishable from a "[" in the input.
>
> Separate from the Perl regexp issue, the original poster's report exposes a 
> change in gawk's error message behavior that might be a bug.
>
> Gawk version 5.1.0 prints an "Invalid regular expression" error message that 
> shows the "[" as the last character of the invalid regular expression.
>
> That is consistent with Wolfgang's explanation that "'\x5B' becomes '[' and 
> is indistinguishable from a '[' in the input."
>
> By version 5.1.1, the error message changed to replace the final "[" with "\".
>
> The same problem affects the "Unmatched [..." error message that prints when 
> additional characters follow the \x5b without closing the bracket expression.
>
> In all these cases, the regexp being printed in the error messages is 
> truncated.  Perhaps the error message is being prepared by copying the input 
> text from before escape processing, but using the character count determined 
> after escape processing.
>
> ###############
>
> ./gawk --version | head -1
> GNU Awk 5.1.0, API: 3.0 (GNU MPFR 3.1.5, GNU MP 6.1.2)
>
> ./gawk 'BEGIN { print match("a[", /^[^[]\x5b/) }'
> gawk: cmd. line:1: error: Invalid regular expression: /^[^[][/
>
> ./gawk 'BEGIN { print match("a[", /^[^[]\x5ba/) }'
> gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[][a/
>
> ./gawk 'BEGIN { print match("a[", /^[^[]\x5baa/) }'
> gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[][aa/
>
> ./gawk 'BEGIN { print match("a[", /^[^[]\x5baaa/) }'
> gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[][aaa/
>
> ###############
>
> ./gawk --version | head -1
> GNU Awk 5.1.1, API: 3.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)
>
> ./gawk 'BEGIN { print match("a[", /^[^[]\x5b/) }'
> gawk: cmd. line:1: error: Invalid regular expression: /^[^[]\/
>
> ./gawk 'BEGIN { print match("a[", /^[^[]\x5ba/) }'
> gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[]\x/
>
> ./gawk 'BEGIN { print match("a[", /^[^[]\x5baa/) }'
> gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[]\x5/
>
> ./gawk 'BEGIN { print match("a[", /^[^[]\x5baaa/) }'
> gawk: cmd. line:1: error: Unmatched [, [^, [:, [., or [=: /^[^[]\x5b/
>
> ###############

Attachment: re-fix.diff
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]