|
From: | Dmitry Gutov |
Subject: | Re: dired-do-find-regexp failure with latin-1 encoding |
Date: | Sun, 29 Nov 2020 19:19:43 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 29.11.2020 19:12, Eli Zaretskii wrote:
Cc: stephen.berman@gmx.net, emacs-devel@gnu.org From: Dmitry Gutov <dgutov@yandex.ru> Date: Sun, 29 Nov 2020 18:07:38 +0200 Adding -a or prepending 'LC_ALL=C' changes that: $ LC_ALL=C grep "prem" latin1.txt premi�re is first premie?re is slightly differentIs that � what Grep actually produced?
That's copied from a terminal emulator. If I run it with shell-command, I get this: premi\350re is first premie?re is slightly different (\350 being a raw char)
What is not clear to me is whether the _output_ is always in some fixed encoding, like UTF-8. That doesn't seem to be stated in the docs there.Judging by a small experiment, rg's output is in the same encoding as input, for each file.So in this aspect it is not better than Grep: it is still impractical to search through files that have different encodings.
It's not optimal, but the important thing is to get matches from all of them. Even if some can be printed in a not-so-readable way.
In any case, if one takes the pre-processing route, the end encoding will be UTF-8.But then the pre-processor will have to guess the encoding (if it is not the same for all the files), which we know is not simple.
Yes.
[Prev in Thread] | Current Thread | [Next in Thread] |