[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bash removes unrequested characters in bracket expressions (not a ra
From: |
Bize Ma |
Subject: |
Re: Bash removes unrequested characters in bracket expressions (not a range). |
Date: |
Sat, 24 Nov 2018 17:34:55 -0400 |
Chet Ramey (<chet.ramey@case.edu>) wrote:
> On 11/23/18 6:09 PM, Bize Ma wrote:
>
> > Bash Version: 4.4
> > Patch Level: 12
> > Release Status: release
>
> > Description:
> >
> > Bash is removing characters not explicitly listed in a bracket
> > expression (character range).
> > In this example, it is removing digits from other languages.
>
> What is your locale?
>
>
The locale used was en_US.utf-8 but also happens with 459
locales out of 868 available under Debian (not in C, for example).
Also in all locales affected (except one), setting either
LC_ALL=$loc or LC_COLLATE=$loc did the same.
Except in zh_CN.gb18030
But IMO locale collation should not be used for an explicit list.
I have been made aware that there is a
cstart = cend = FOLD (cstart);
inside the `sm_loop.c` file that will convert into a range many
individual character. If that understanding is correct that is the
source of the difference with other shells.
I have the perception that a collation table *must have a "total order"*,
in fact, an strict total order. If two characters `a` and `b` could sort as
equal the order will fail to provide a confirmation that a character is
absent from the list. Consider characters `a`, `b` and `c`, if a and b
sort as equal, a sorted list in which we find `a` followed by `c` doesn't
confirm that `b` is absent as the order could well be `b a c`.
In this case, there must not be any other character than `a` in the
range `a-a` and using a range `a-a` is equivalent (just slower and
more complex) to the single character `a`.
If this is not the case, the error is in the collation table, not in using
single (faster) characters. And what should be updated is such
collation table IMO.