bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores


From: Dmitry Gutov
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Thu, 27 Jul 2023 03:41:29 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 26/07/2023 12:09, Ihor Radchenko wrote:
It's also something to note that, GC-wise, numbers 1 and 2 are not the
worst: the time must be spent somewhere else.
Indeed. I did more detailed analysis in
https://yhetil.org/emacs-devel/87cz0p2xlc.fsf@localhost/

Main contributors in the lisp versions are (in the order from most
significant to less significant) (1) file name handlers; (2) regexp
matching of the file names; (3) nconc calls in the current
`directory-files-recursively' implementation.

I have modified `directory-files-recursively' to avoid O(N^2) `nconc'
calls + bypassing regexp matches when REGEXP is nil.

Sounds good. I haven't examined the diff closely, but it sounds like an improvement that can be applied irrespective of how this discussion ends.

Skipping regexp matching entirely, though, will make this benchmark farther removed from real-life usage: this thread started from being able to handle multiple ignore entries when listing files (e.g. in a project). So any solution for that (whether we use it on all or just some platforms) needs to be able to handle those. And it doesn't seem like directory-files-recursively has any alternative solution for that other than calling string-match on every found file.

Here are the results (using the attached modified version of your
benchmark file):

(my-bench 10 "/usr/src/linux/" "")
(("built-in" . "Elapsed time: 7.285597s (3.853368s in 6 GCs)")
  ("built-in no filename handler alist" . "Elapsed time: 5.855019s (3.760662s in 6 
GCs)")
  ("built-in non-recursive no filename handler alist" . "Elapsed time: 5.817639s 
(4.326945s in 7 GCs)")
  ("built-in non-recursive no filename handler alist + skip re-match" . "Elapsed 
time: 2.708306s (1.871665s in 3 GCs)")
  ("with-find" . "Elapsed time: 6.082200s (4.262830s in 7 GCs)")
  ("with-find-p" . "Elapsed time: 4.325503s (3.058647s in 5 GCs)")
  ("with-find-sync" . "Elapsed time: 3.267648s (1.903655s in 3 GCs)"))

Nice.

  (let ((gc-cons-threshold most-positive-fixnum))
    (my-bench 10 "/usr/src/linux/" ""))
(("built-in" . "Elapsed time: 2.754473s")
  ("built-in no filename handler alist" . "Elapsed time: 1.322443s")
  ("built-in non-recursive no filename handler alist" . "Elapsed time: 
1.235044s")
  ("built-in non-recursive no filename handler alist + skip re-match" . "Elapsed 
time: 0.750275s")
  ("with-find" . "Elapsed time: 1.438510s")
  ("with-find-p" . "Elapsed time: 1.200876s")
  ("with-find-sync" . "Elapsed time: 1.349755s"))

If we forget about GC, Elisp version can get fairly close to GNU find.
And if we do not perform regexp matching (which makes sense when the
REGEXP is ""), Elisp version is faster.

We can't really forget about GC, though.

But the above numbers make me hopeful about the async-parallel solution, implying that the parallelization really can help (and offset whatever latency we lose on pselect), as soon as we determine the source of extra consing and decide what to do about it.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]