bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores


From: Eli Zaretskii
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Thu, 27 Jul 2023 08:22:27 +0300

> Date: Thu, 27 Jul 2023 03:41:29 +0300
> Cc: Eli Zaretskii <eliz@gnu.org>, luangruo@yahoo.com, sbaugh@janestreet.com,
>  64735@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> > I have modified `directory-files-recursively' to avoid O(N^2) `nconc'
> > calls + bypassing regexp matches when REGEXP is nil.
> 
> Sounds good. I haven't examined the diff closely, but it sounds like an 
> improvement that can be applied irrespective of how this discussion ends.

That change should be submitted as a separate issue and discussed in
detail before we decide we can make it.

> Skipping regexp matching entirely, though, will make this benchmark 
> farther removed from real-life usage: this thread started from being 
> able to handle multiple ignore entries when listing files (e.g. in a 
> project).

Agreed.  From my POV, that variant's purpose was only to show how much
time is spent in matching file names against some include or exclude
list.

> So any solution for that (whether we use it on all or just 
> some platforms) needs to be able to handle those. And it doesn't seem 
> like directory-files-recursively has any alternative solution for that 
> other than calling string-match on every found file.

There's a possibility of pushing this filtering into
file-name-all-completions, but I'm not sure that will be faster.  We
should try that and measure the results, I think.

> > If we forget about GC, Elisp version can get fairly close to GNU find.
> > And if we do not perform regexp matching (which makes sense when the
> > REGEXP is ""), Elisp version is faster.
> 
> We can't really forget about GC, though.

But we could temporarily lift the threshold while this function runs,
if that leads to significant savings.

> But the above numbers make me hopeful about the async-parallel solution, 
> implying that the parallelization really can help (and offset whatever 
> latency we lose on pselect), as soon as we determine the source of extra 
> consing and decide what to do about it.

Isn't it clear that additional consing comes from the fact that we
first insert the Find's output into a buffer or produce a string from
it, and then chop that into individual file names?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]