bug#64735: 29.0.92; find invocations are ~15x slower because of ignores

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores

From:	Dmitry Gutov
Subject:	bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date:	Thu, 27 Jul 2023 16:30:56 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 27/07/2023 08:22, Eli Zaretskii wrote:

Date: Thu, 27 Jul 2023 03:41:29 +0300
Cc: Eli Zaretskii <eliz@gnu.org>, luangruo@yahoo.com, sbaugh@janestreet.com,
  64735@debbugs.gnu.org
From: Dmitry Gutov <dmitry@gutov.dev>

I have modified `directory-files-recursively' to avoid O(N^2) `nconc'
calls + bypassing regexp matches when REGEXP is nil.


Sounds good. I haven't examined the diff closely, but it sounds like an
improvement that can be applied irrespective of how this discussion ends.


That change should be submitted as a separate issue and discussed in
detail before we decide we can make it.


Sure.

If we forget about GC, Elisp version can get fairly close to GNU find.
And if we do not perform regexp matching (which makes sense when the
REGEXP is ""), Elisp version is faster.


We can't really forget about GC, though.


But we could temporarily lift the threshold while this function runs,
if that leads to significant savings.

I mean, everything's doable, but if we do this for this function, whynot others? Most long-running code would see an improvement from thatkind of change (the 'find'-based solutions too).

IIRC the main drawback is running out of memory in extreme conditions oron low-memory platforms/devices. It's not like this feature isparticularly protected from this.

But the above numbers make me hopeful about the async-parallel solution,
implying that the parallelization really can help (and offset whatever
latency we lose on pselect), as soon as we determine the source of extra
consing and decide what to do about it.


Isn't it clear that additional consing comes from the fact that we
first insert the Find's output into a buffer or produce a string from
it, and then chop that into individual file names?

But we do that in all 'find'-based solutions: the synchronous one takesbuffer text and chops it into strings. The first asynchronous does thesame. The other ("with-find-p") works from a process filter, chopping upstrings that get passed to it.

But the amount of time spent in GC is different, with most of thedifference in performance attributable to it: if we subtract time spentin GC, the runtimes are approximately equal.

I can imagine that the filter-based approach necessarily creates morestrings (to pass to the filter function). Maybe we could increase thosestrings' size (thus reducing the number) by increasing the read buffersize? I haven't found a relevant variable, though.

Or if there was some other callback that runs after the next chunk ofoutput arrives from the process, we could parse it from the buffer. Butthe insertion into the buffer would need to be made efficient(apparently internal-default-process-filter currently uses the samesequence of strings as the other filters for input, with the same amountof consing).

[Prev in Thread]

Current Thread

[Next in Thread]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, (continued)

Prev by Date: bug#64871: 30.0.50; [FR] Add command and menu item to open user init file
Next by Date: bug#64712: 29.0.92; Emacs 29 with native compilation compiles cl-loaddefs.el on every startup
Previous by thread: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Next by thread: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Index(es):
- Date
- Thread