Re: dired-do-find-regexp failure with latin-1 encoding

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dired-do-find-regexp failure with latin-1 encoding

From:	Gregory Heytings
Subject:	Re: dired-do-find-regexp failure with latin-1 encoding
Date:	Sun, 29 Nov 2020 19:49:14 +0000
User-agent:	Alpine 2.22 (NEB 394 2020-01-19)

Then I think injecting LC_ALL=C into the environment when running Grepin this case makes the results more useful? And we can then avoidusing -a?
I'm not so sure. LC_ALL=C seems more problematic than -a:

$ grep ф test.txt
фыва
$ grep -a ф test.txt
фыва
$ LC_ALL=C grep ф test.txt
(nothing)
I guess this regression in Grep happened when they "internationalized"the DFA code, sigh...

FWIW, I "bisected" this with various versions of grep, and this regressionhappened in 2014, between versions 2.20 and 2.21:


echo -ne "premi\xE8re\n" > latin1.txt
echo -ne "premi\xC3\xA8re\n" > utf8.txt
echo -ne "premi\xE8re\npremi\xC3\xA8re\n" > both.txt

With 2.20 with rxvt (which is clever enough to display UTF-8 and Latin-1 at the 
same time):
$ grep prem *.txt
both.txt:première
both.txt:première
latin1.txt:première
utf8.txt:première

With 2.20 with M-x shell (the \350 is a single character):
both.txt:premi\350re
both.txt:première
latin1.txt:premi\350re
utf8.txt:première

With 2.21, with rxvt or M-x shell:
grep prem *.txt
Binary file both.txt matches
Binary file latin1.txt matches
utf8.txt:première

[Prev in Thread]

Current Thread

[Next in Thread]

Re: dired-do-find-regexp failure with latin-1 encoding, (continued)

Prev by Date: Re: dired-do-find-regexp failure with latin-1 encoding
Next by Date: Re: dired-do-find-regexp failure with latin-1 encoding
Previous by thread: Re: dired-do-find-regexp failure with latin-1 encoding
Next by thread: Re: dired-do-find-regexp failure with latin-1 encoding
Index(es):
- Date
- Thread