emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20657: closed (Traditional range expression not accepted in regex/df


From: GNU bug Tracking System
Subject: bug#20657: closed (Traditional range expression not accepted in regex/dfa)
Date: Fri, 22 Apr 2022 02:10:02 +0000

Your message dated Thu, 21 Apr 2022 19:08:55 -0700
with message-id <89b7650f-bb7a-04d8-128c-e9d4977ed566@cs.ucla.edu>
and subject line Re: Accepting [xyz---abc] - three minus signs to mean one
has caused the debbugs.gnu.org bug report #20657,
regarding Traditional range expression not accepted in regex/dfa
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
20657: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=20657
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: Traditional range expression not accepted in regex/dfa Date: Tue, 26 May 2015 05:42:19 +0300 User-agent: Heirloom mailx 12.5 6/20/10
Hi.

I received a bug report for gawk by private email that a regexp of
this form: '[^0-9---]' wasn't accepted.  The bugaboo here is the "---"; it's
a range expression consisting of minus through minus, and apparently long
ago was how one got a minus into a bracket expression.

This can be seen in current grep also:

        $ ./src/grep --version
        ./src/grep (GNU grep) 2.21
        Copyright (C) 2014 Free Software Foundation, Inc.
        ...

        $ ./src/grep '[^0-9---]' /dev/null
        ./src/grep: Invalid range end

The underlying regex and, I believe, dfa routines don't accept this.
Fixing either of them is beyond my skill range, so I thought I'd
pass this one upstream to you folks.

Thanks!

Arnold



--- End Message ---
--- Begin Message --- Subject: Re: Accepting [xyz---abc] - three minus signs to mean one Date: Thu, 21 Apr 2022 19:08:55 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0
On 4/21/22 00:57, Arnold Robbins wrote:

As far as my testing indicates, dfa.c doesn't need a patch, it seems
to accept "---" inside brackets for a single minus.

Yes, a brief perusal of the dfa.c source code suggests you're right. Thanks for looking into this. I tend to agree with you that POSIX is not likely to outlaw this extension.


If there are no objections, can we get this into Gnulib?

Although the basic idea looks good, I see a few places where the patch can be improved.

* The two calls to re_string_peek_byte might go past the end of the pattern (a subscript violation). This is possible because the pattern is not necessarily null-terminated.

* The two calls to re_string_fetch_byte can be simplified into a single call to re_string_skip_bytes.

* No need to assign to token->opr.c, as it already has the correct value.

* Can fall through to the default case to save a bit of duplicate code.

* glibc still uses comments /* like this */ for style reasons, and we should stick to that.

I wrote a patch with these improvements in mind and installed it into Gnulib (see attached); hope it works for Gawk too.

Attachment: 0001-regex-match-.-.-like-V7-grep.patch
Description: Text Data


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]