coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: find: locale affects results incorrectly


From: Eric Blake
Subject: Re: find: locale affects results incorrectly
Date: Fri, 7 Aug 2020 08:45:57 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 8/7/20 8:06 AM, sunnycemetery@gmail.com wrote:
... possibly.  Please see for yourself:

■ LC_ALL=C ls -l
total 1
-rw-r--r-- 1 userx userx 0 Aug 7 08:35 ''$'\325\253\302\265\366''+'$'\325\361\275\322\374\253\322\342\203\322\351''+'$'\322\351\245\322\342\304\264''+'$'\364''rd'$'\264''+'$'\342''07.srt'
■ echo $LANG
ja_JP.utf8
■ find -name '*.srt'
■ LC_ALL=C find -name '*.srt'
./?????+???????????+???????+?rd?+?07.srt

I have attached logs of the following debug command for either locale, with ‘ and ’ replaced with ' for quick diff comparison.  Debug output does not elucidate much, but perhaps someone can shed light on how such a seemingly simple search could possibly fail (or even be affected by locale in the first place).

find -D all -name '*.srt'

'find' is not part of coreutils. That said, you are correct that globbing is locale-sensitive. You have a filename that uses invalid encodings in some locales but not others. But POSIX says that the '*' glob only has to match characters, not encoding errors. So your choice of locale (and thus which byte sequences are valid characters) indeed affects the results of the glob, and therefore what find is able to output.

I would argue that this is not a bug, but you may get other opinions if you ask on bug-findutils.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




reply via email to

[Prev in Thread] Current Thread [Next in Thread]