bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores


From: sbaugh
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Sat, 22 Jul 2023 17:18:19 +0000 (UTC)
User-agent: Gnus/5.13 (Gnus v5.13)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: sbaugh@catern.com
>> Date: Sat, 22 Jul 2023 10:38:37 +0000 (UTC)
>> Cc: Spencer Baugh <sbaugh@janestreet.com>, dmitry@gutov.dev,
>>      yantar92@posteo.net, michael.albinus@gmx.de, rms@gnu.org,
>>      64735@debbugs.gnu.org
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> > No, the first step is to use in Emacs what Find does today, because it
>> > will already be a significant speedup.
>> 
>> Why bother?  directory-files-recursively is a rarely used API, as you
>> have mentioned before in this thread.
>
> Because we could then use it much more (assuming the result will be
> performant enough -- this remains to be seen).
>
>> And there is a way to speed it up which will have a performance boost
>> which is unbeatable any other way: Use find instead of
>> directory-files-recursively, and operate on files as they find prints
>> them.
>
> Not every command can operate on the output sequentially: some need to
> see all of the output, others will need to be redesigned and
> reimplemented to support such sequential mode.
>
> Moreover, piping from Find incurs overhead: data is broken into blocks
> by the pipe or PTY, reading the data can be slowed down if Emacs is
> busy processing something, etc.

I went ahead and implemented it, and I get a 2x speedup even *without*
running find in parallel with Emacs.

First my results:

(my-bench 100 "~/public_html" "")
(("built-in" . "Elapsed time: 1.140173s (0.389344s in 5 GCs)")
 ("with-find" . "Elapsed time: 0.643306s (0.305130s in 4 GCs)"))

(my-bench 10 "~/.local/src/linux" "")
(("built-in" . "Elapsed time: 2.402341s (0.937857s in 11 GCs)")
 ("with-find" . "Elapsed time: 1.544024s (0.827364s in 10 GCs)"))

(my-bench 100 "/ssh:catern.com:~/public_html" "")
(("built-in" . "Elapsed time: 36.494233s (6.450840s in 79 GCs)")
 ("with-find" . "Elapsed time: 4.619035s (1.133656s in 14 GCs)"))

2x speedup on local files, and almost a 10x speedup for remote files.

And my implementation *isn't even using the fact that find can run in
parallel with Emacs*.  If I did start using that, I expect even more
speed gains from parallelism, which aren't achievable in Emacs itself.

So can we add something like this (with the appropriate fallbacks to
directory-files-recursively), since it has such a big speedup even
without parallelism?

My implementation and benchmarking:

(defun find-directory-files-recursively (dir regexp &optional 
include-directories _predicate follow-symlinks)
  (cl-assert (null _predicate) t "find-directory-files-recursively can't accept 
arbitrary predicates")
  (with-temp-buffer
    (setq case-fold-search nil)
    (cd dir)
    (let* ((command
            (append
             (list "find" (file-local-name dir))
             (if follow-symlinks
                 '("-L")
               '("!" "(" "-type" "l" "-xtype" "d" ")"))
             (unless (string-empty-p regexp)
               "-regex" (concat ".*" regexp ".*"))
             (unless include-directories
               '("!" "-type" "d"))
             '("-print0")
             ))
           (remote (file-remote-p dir))
           (proc
            (if remote
                (let ((proc (apply #'start-file-process
                                   "find" (current-buffer) command)))
                  (set-process-sentinel proc (lambda (_proc _state)))
                  (set-process-query-on-exit-flag proc nil)
                  proc)
              (make-process :name "find" :buffer (current-buffer)
                            :connection-type 'pipe
                            :noquery t
                            :sentinel (lambda (_proc _state))
                            :command command))))
      (while (accept-process-output proc))
      (let ((start (goto-char (point-min))) ret)
        (while (search-forward "\0" nil t)
          (push (concat remote (buffer-substring-no-properties start (1- 
(point)))) ret)
          (setq start (point)))
        ret))))

(defun my-bench (count path regexp)
  (setq path (expand-file-name path))
  (let ((old (directory-files-recursively path regexp))
        (new (find-directory-files-recursively path regexp)))
    (dolist (path old)
      (should (member path new)))
    (dolist (path new)
      (should (member path old))))
  (list
   (cons "built-in" (benchmark count (list 'directory-files-recursively path 
regexp)))
   (cons "with-find" (benchmark count (list 'find-directory-files-recursively 
path regexp)))))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]