|
From: | Gregory Heytings |
Subject: | Re: dired-do-find-regexp failure with latin-1 encoding |
Date: | Sun, 29 Nov 2020 19:49:14 +0000 |
User-agent: | Alpine 2.22 (NEB 394 2020-01-19) |
Then I think injecting LC_ALL=C into the environment when running Grep in this case makes the results more useful? And we can then avoid using -a?I'm not so sure. LC_ALL=C seems more problematic than -a: $ grep ф test.txt фыва $ grep -a ф test.txt фыва $ LC_ALL=C grep ф test.txt (nothing)I guess this regression in Grep happened when they "internationalized" the DFA code, sigh...
FWIW, I "bisected" this with various versions of grep, and this regression happened in 2014, between versions 2.20 and 2.21:
echo -ne "premi\xE8re\n" > latin1.txt echo -ne "premi\xC3\xA8re\n" > utf8.txt echo -ne "premi\xE8re\npremi\xC3\xA8re\n" > both.txt With 2.20 with rxvt (which is clever enough to display UTF-8 and Latin-1 at the same time): $ grep prem *.txt both.txt:première both.txt:première latin1.txt:première utf8.txt:première With 2.20 with M-x shell (the \350 is a single character): both.txt:premi\350re both.txt:première latin1.txt:premi\350re utf8.txt:première With 2.21, with rxvt or M-x shell: grep prem *.txt Binary file both.txt matches Binary file latin1.txt matches utf8.txt:première
[Prev in Thread] | Current Thread | [Next in Thread] |