[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
From: |
sbaugh |
Subject: |
bug#64735: 29.0.92; find invocations are ~15x slower because of ignores |
Date: |
Sat, 22 Jul 2023 17:18:19 +0000 (UTC) |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Eli Zaretskii <eliz@gnu.org> writes:
>> From: sbaugh@catern.com
>> Date: Sat, 22 Jul 2023 10:38:37 +0000 (UTC)
>> Cc: Spencer Baugh <sbaugh@janestreet.com>, dmitry@gutov.dev,
>> yantar92@posteo.net, michael.albinus@gmx.de, rms@gnu.org,
>> 64735@debbugs.gnu.org
>>
>> Eli Zaretskii <eliz@gnu.org> writes:
>> > No, the first step is to use in Emacs what Find does today, because it
>> > will already be a significant speedup.
>>
>> Why bother? directory-files-recursively is a rarely used API, as you
>> have mentioned before in this thread.
>
> Because we could then use it much more (assuming the result will be
> performant enough -- this remains to be seen).
>
>> And there is a way to speed it up which will have a performance boost
>> which is unbeatable any other way: Use find instead of
>> directory-files-recursively, and operate on files as they find prints
>> them.
>
> Not every command can operate on the output sequentially: some need to
> see all of the output, others will need to be redesigned and
> reimplemented to support such sequential mode.
>
> Moreover, piping from Find incurs overhead: data is broken into blocks
> by the pipe or PTY, reading the data can be slowed down if Emacs is
> busy processing something, etc.
I went ahead and implemented it, and I get a 2x speedup even *without*
running find in parallel with Emacs.
First my results:
(my-bench 100 "~/public_html" "")
(("built-in" . "Elapsed time: 1.140173s (0.389344s in 5 GCs)")
("with-find" . "Elapsed time: 0.643306s (0.305130s in 4 GCs)"))
(my-bench 10 "~/.local/src/linux" "")
(("built-in" . "Elapsed time: 2.402341s (0.937857s in 11 GCs)")
("with-find" . "Elapsed time: 1.544024s (0.827364s in 10 GCs)"))
(my-bench 100 "/ssh:catern.com:~/public_html" "")
(("built-in" . "Elapsed time: 36.494233s (6.450840s in 79 GCs)")
("with-find" . "Elapsed time: 4.619035s (1.133656s in 14 GCs)"))
2x speedup on local files, and almost a 10x speedup for remote files.
And my implementation *isn't even using the fact that find can run in
parallel with Emacs*. If I did start using that, I expect even more
speed gains from parallelism, which aren't achievable in Emacs itself.
So can we add something like this (with the appropriate fallbacks to
directory-files-recursively), since it has such a big speedup even
without parallelism?
My implementation and benchmarking:
(defun find-directory-files-recursively (dir regexp &optional
include-directories _predicate follow-symlinks)
(cl-assert (null _predicate) t "find-directory-files-recursively can't accept
arbitrary predicates")
(with-temp-buffer
(setq case-fold-search nil)
(cd dir)
(let* ((command
(append
(list "find" (file-local-name dir))
(if follow-symlinks
'("-L")
'("!" "(" "-type" "l" "-xtype" "d" ")"))
(unless (string-empty-p regexp)
"-regex" (concat ".*" regexp ".*"))
(unless include-directories
'("!" "-type" "d"))
'("-print0")
))
(remote (file-remote-p dir))
(proc
(if remote
(let ((proc (apply #'start-file-process
"find" (current-buffer) command)))
(set-process-sentinel proc (lambda (_proc _state)))
(set-process-query-on-exit-flag proc nil)
proc)
(make-process :name "find" :buffer (current-buffer)
:connection-type 'pipe
:noquery t
:sentinel (lambda (_proc _state))
:command command))))
(while (accept-process-output proc))
(let ((start (goto-char (point-min))) ret)
(while (search-forward "\0" nil t)
(push (concat remote (buffer-substring-no-properties start (1-
(point)))) ret)
(setq start (point)))
ret))))
(defun my-bench (count path regexp)
(setq path (expand-file-name path))
(let ((old (directory-files-recursively path regexp))
(new (find-directory-files-recursively path regexp)))
(dolist (path old)
(should (member path new)))
(dolist (path new)
(should (member path old))))
(list
(cons "built-in" (benchmark count (list 'directory-files-recursively path
regexp)))
(cons "with-find" (benchmark count (list 'find-directory-files-recursively
path regexp)))))
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, (continued)
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores,
sbaugh <=
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/22
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/22
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/22
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/22
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Spencer Baugh, 2023/07/22
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Ihor Radchenko, 2023/07/23
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/07/23