[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [A-Z], [:upper:]
From: |
Greg Wooledge |
Subject: |
Re: [A-Z], [:upper:] |
Date: |
Fri, 29 Mar 2019 08:41:49 -0400 |
User-agent: |
NeoMutt/20170113 (1.7.2) |
> | [A-Z] isn't safe to use unless ...
>
> That's true to an extent, but we know here that the intent is to
> match 'C' which is between A and Z in every locale in the universe.
> Variations on A might not be, variations on Z might not be, and there
> might be more than just the upper case English letters between A and Z,
> included in the ragne (even including things which are not letters at
> all, upper case or not, and lower case chars might be included) but we
> can assume that for any real locale, 'C' will be in that range (real as
> being one in use in the world, rather than one invented for the very
> purpose of not including C in the collating sequence between A and Z)
So, embracing and extending your assumptions, we can also claim that
the letter T is between A and Z in every locale in the universe, right?
wooledg:~$ printf %s\\n {A..Z} | LC_COLLATE=et_EE.utf8 sort | tr '\n' ' '
A B C D E F G H I J K L M N O P Q R S Z T U V W X Y
Isn't real life FUN?
But perhaps you're right about the letter C specifically. Maybe that
one letter just happens to lie between A and Z in every locale on Earth.
I don't happen to know of any counter-examples... yet.
Now, for the original poster: the meaning of [A-Z] and [a-z] did in
fact change between bash 4 and bash 5.
wooledg:~$ bash-4.4 -c 'LC_COLLATE=et_EE.utf8; [[ T = [A-Z] ]] && echo match'
wooledg:~$ bash-5.0 -c 'LC_COLLATE=et_EE.utf8; [[ T = [A-Z] ]] && echo match'
match
This is yet one more reason you can't rely on [A-Z] or [a-z] to work
as expected in scripts. Even between different versions of bash, within
the same locale, on the same computer, it doesn't behave consistently.
I strongly recommend switching to [[:upper:]] and friends, unless you
always work in the C locale (and explicitly set it in your scripts).