bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores


From: Ihor Radchenko
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Thu, 27 Jul 2023 08:20:55 +0000

Eli Zaretskii <eliz@gnu.org> writes:

>> > I have modified `directory-files-recursively' to avoid O(N^2) `nconc'
>> > calls + bypassing regexp matches when REGEXP is nil.
>> 
>> Sounds good. I haven't examined the diff closely, but it sounds like an 
>> improvement that can be applied irrespective of how this discussion ends.
>
> That change should be submitted as a separate issue and discussed in
> detail before we decide we can make it.

I will look into it. This was mostly a quick and dirty rewrite without
paying too match attention to file order in the result.

>> Skipping regexp matching entirely, though, will make this benchmark 
>> farther removed from real-life usage: this thread started from being 
>> able to handle multiple ignore entries when listing files (e.g. in a 
>> project).
>
> Agreed.  From my POV, that variant's purpose was only to show how much
> time is spent in matching file names against some include or exclude
> list.

Yes and no.

It is not uncommon to query _all_ the files in directory and something
as simple as

(when (and (not (member regexp '("" ".*"))) (string-match regexp file))...)

can give considerable speedup.

Might be worth adding such optimization.

>> So any solution for that (whether we use it on all or just 
>> some platforms) needs to be able to handle those. And it doesn't seem 
>> like directory-files-recursively has any alternative solution for that 
>> other than calling string-match on every found file.
>
> There's a possibility of pushing this filtering into
> file-name-all-completions, but I'm not sure that will be faster.  We
> should try that and measure the results, I think.

Isn't `file-name-all-completions' more limited and cannot accept
arbitrary regexp?

>> We can't really forget about GC, though.
>
> But we could temporarily lift the threshold while this function runs,
> if that leads to significant savings.

Yup. Also, GC times and frequencies will vary across different Emacs
sessions. So, we may not want to rely on it when comparing the
benchmarks from different people.

>> But the above numbers make me hopeful about the async-parallel solution, 
>> implying that the parallelization really can help (and offset whatever 
>> latency we lose on pselect), as soon as we determine the source of extra 
>> consing and decide what to do about it.
>
> Isn't it clear that additional consing comes from the fact that we
> first insert the Find's output into a buffer or produce a string from
> it, and then chop that into individual file names?

To add to it, I also tried to implement a version of
`directory-files-recursively' that first inserts all the files in buffer
and then filters them using `re-search-forward' instead of calling
`string-match' on every file name string.
That ended up being slower compared to the current `string-match' approach.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]