bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #48055] Regex ranges and locales in gnu-awk regextype


From: James Youngman
Subject: Re: [bug #48055] Regex ranges and locales in gnu-awk regextype
Date: Sun, 27 Nov 2016 17:15:25 +0000

Findutils uses the regular expression implementation from gnulib.  So this
problem likely also exists there, or perhaps has already been fixed there.

On Mon, May 30, 2016 at 7:12 AM, Piotr Jurkiewicz <address@hidden>
wrote:

> URL:
>   <http://savannah.gnu.org/bugs/?48055>
>
>                  Summary: Regex ranges and locales in gnu-awk regextype
>                  Project: findutils
>             Submitted by: piotrjurkiewicz
>             Submitted on: Mon 30 May 2016 08:12:40 AM CEST
>                 Category: find
>                 Severity: 3 - Normal
>               Item Group: Wrong result
>                   Status: None
>                  Privacy: Public
>              Assigned to: None
>          Originator Name:
>         Originator Email:
>              Open/Closed: Open
>          Discussion Lock: Any
>                  Release: 4.6.0
>            Fixed Release: None
>
>     _______________________________________________________
>
> Details:
>
> Starting with gawk 4.0 the traditional behaviour of regex ranges has been
> brought back. This means that [a-z] matches only lowercase letters and
> [A-Z]
> matches only uppercase letters, regardless of locale and collation being
> set.
>
> See more:
> https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html
>
> Can test this with the following command:
>
> $ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk pre-4.0
> ABC
>
> $ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk 4.0+
> [nothing]
>
> Findutils, however, still emulate the old behaviour of gawk in gnu-awk
> mode.
> That is, when using certain locales, [a-z] and [A-Z] ranges matches both
> lowercase and uppercase letters.
>
> Test:
>
> Prepare:
>
> mkdir test
> cd test
> touch a.lower
> touch b.UPPER
>
> Then both commands:
>
> LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[a-z]{5}$'
> LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[A-Z]{5}$'
>
> returns:
>
> ./a.lower
> ./b.UPPER
>
> instead just one file with appropriate case.
>
>
>
>
>     _______________________________________________________
>
> Reply to this item at:
>
>   <http://savannah.gnu.org/bugs/?48055>
>
> _______________________________________________
>   Message sent via/by Savannah
>   http://savannah.gnu.org/
>
>
>


-- 
--
This email is intended solely for the use of its addressee, sender, and any
readers of a mailing list archive in which it happens to appear.   If you
have received this email in error, please say or type three times, "I
believe in the utility of email disclaimers," and then reply to the author
correcting any spellings (and, optionally, any incorrect spellings),
accompanying these with humorous jests about the author's parentage.   If
you are not the addressee, you are nevertheless permitted to both copy and
forward this email since without such permissions email systems are unable
to transmit email to anybody, intended recipient or not.  To those still
reading by this point, the author would like to apologise for being unable
to maintain a consistent level of humour throughout this disclaimer.
Contents may settle during transit.  Do not feed the animals.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]