|
From: | Dmitry Gutov |
Subject: | bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps |
Date: | Wed, 2 Dec 2020 19:43:52 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
On 02.12.2020 19:39, Eli Zaretskii wrote:
Cc: abela@chalmers.se, 31796@debbugs.gnu.org From: Dmitry Gutov <dgutov@yandex.ru> Date: Wed, 2 Dec 2020 19:17:06 +0200 On 02.12.2020 16:56, Eli Zaretskii wrote:The point is that our heuristics for detecting encoding is not perfect, so it could fail.Do you imagine Grep could use a more reliable detection algorithm?No, I don't. But it could allow the user to specify a different encoding for each file, as in grep --encoding=FOO FILES1* --encoding=BAR FILES2*
Not sure we can call it like that in an automated fashion (i.e. in project-find-regexp). But hey, somebody else could.
etc. And even if it just did the job of the same quality as we do, it will do it faster, which is why we use Grep in the first place, right?
That's true.
The important part of the "enhancement" I described is actually the fact that the output gets encoded in a single encoding, no matter what was the encoding of the original files. This makes reading and decoding the output simple and always correct.
Yes, OK.
Although... since it has to scan the full file anyway, it could first do a quick detection, and then maybe rescan from the beginning if the encoding turns out to be something else.That'd be too late, as some matches were already output.
It could buffer them until the full file has been parsed. Encoding detection and conversion must add a certain overhead anyway, so I'm not sure how expensive the extra buffering would be in comparison.
As a bonus, per-file buffering like that would allow easier parallelization of searches.
[Prev in Thread] | Current Thread | [Next in Thread] |