bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores


From: Eli Zaretskii
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Sat, 22 Jul 2023 20:46:01 +0300

> From: sbaugh@catern.com
> Date: Sat, 22 Jul 2023 17:18:19 +0000 (UTC)
> Cc: sbaugh@janestreet.com, yantar92@posteo.net, rms@gnu.org, dmitry@gutov.dev,
>       michael.albinus@gmx.de, 64735@debbugs.gnu.org
> 
> First my results:
> 
> (my-bench 100 "~/public_html" "")
> (("built-in" . "Elapsed time: 1.140173s (0.389344s in 5 GCs)")
>  ("with-find" . "Elapsed time: 0.643306s (0.305130s in 4 GCs)"))
> 
> (my-bench 10 "~/.local/src/linux" "")
> (("built-in" . "Elapsed time: 2.402341s (0.937857s in 11 GCs)")
>  ("with-find" . "Elapsed time: 1.544024s (0.827364s in 10 GCs)"))
> 
> (my-bench 100 "/ssh:catern.com:~/public_html" "")
> (("built-in" . "Elapsed time: 36.494233s (6.450840s in 79 GCs)")
>  ("with-find" . "Elapsed time: 4.619035s (1.133656s in 14 GCs)"))
> 
> 2x speedup on local files, and almost a 10x speedup for remote files.

Thanks, that's impressive.  But you omitted some of the features of
directory-files-recursively, see below.

> And my implementation *isn't even using the fact that find can run in
> parallel with Emacs*.  If I did start using that, I expect even more
> speed gains from parallelism, which aren't achievable in Emacs itself.

I'm not sure I understand what you mean by "in parallel" and why it
would be faster.

> So can we add something like this (with the appropriate fallbacks to
> directory-files-recursively), since it has such a big speedup even
> without parallelism?

We can have an alternative implementation, yes.  But it should support
predicate, and it should sort the files in each directory like
directory-files-recursively does, so that it's a drop-in replacement.
Also, I believe that Find does return "." in each directory, and your
implementation doesn't filter them, whereas
directory-files-recursively does AFAIR.

And I see no need for any fallback: that's for the application to do
if it wants.

>   (cl-assert (null _predicate) t "find-directory-files-recursively can't 
> accept arbitrary predicates")

It should.

>            (if follow-symlinks
>                '("-L")
>              '("!" "(" "-type" "l" "-xtype" "d" ")"))
>            (unless (string-empty-p regexp)
>              "-regex" (concat ".*" regexp ".*"))
>            (unless include-directories
>              '("!" "-type" "d"))
>            '("-print0")

Some of these switches are specific to GNU Find.  Are we going to
support only GNU Find?

>            ))
>          (remote (file-remote-p dir))
>          (proc
>           (if remote
>               (let ((proc (apply #'start-file-process
>                                  "find" (current-buffer) command)))
>                 (set-process-sentinel proc (lambda (_proc _state)))
>                 (set-process-query-on-exit-flag proc nil)
>                 proc)
>             (make-process :name "find" :buffer (current-buffer)
>                           :connection-type 'pipe
>                           :noquery t
>                           :sentinel (lambda (_proc _state))
>                           :command command))))
>       (while (accept-process-output proc))

Why do you call accept-process-output here? it could interfere with
reading output from async subprocesses running at the same time.  To
come think of this, why use async subprocesses here and not
call-process?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]