[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Locale aware range expressions?
From: |
arnold |
Subject: |
Re: Locale aware range expressions? |
Date: |
Sun, 28 Jan 2024 08:17:17 -0700 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
I think this is a bug in the documentation; the regex and dfa
libraries these days use Rational Range Interpretation(tm).
Paul, do you agree?
Arnold
"Ronan Pigott" <ronan@rjp.ie> wrote:
> Hi grep,
>
> The grep manual, in the section titled "Character Classes and Bracket
> Expressions" is careful to point out the effect of the user's locale and
> collation order on the meaning of range expressions. In particular, it
> highlights that [a-d] is equivalent to [abcd] in the C locale, but may be
> equivalent to [aAbBcCdD] in the user's locale because:
>
> "It matches any single character that sorts between the two characters,
> inclusive, using the locale's collating sequence and character set."
>
> However, in my experience this is not true.
>
> $ grep ^NAME /etc/os-release; pacman -Q grep
> NAME="Arch Linux"
> grep 3.11-1
>
> $ locale | grep -E '^(LANG|LC_COLLATE|LC_ALL)'
> LANG=en_US.UTF-8
> LC_COLLATE="en_US.UTF-8"
> LC_ALL=
>
> # locale aware collation, exactly as described in grep(1)
> $ print -l {a..d} {A..D} | sort
> a
> A
> b
> B
> c
> C
> d
> D
>
> # only lowercase matches, despite A/B/C all sorting within the range
> $ print -l {a..d} {A..D} | grep '[a-d]'
> a
> b
> c
> d
>
> This contradicts the grep manual afaict. Is this a bug in grep or the
> documentation? Is it user error?
>
> Thanks,
>
> Ronan
>
Re: Locale aware range expressions?, Paul Eggert, 2024/01/28