bug#64735: 29.0.92; find invocations are ~15x slower because of ignores

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores

From:	Dmitry Gutov
Subject:	bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date:	Tue, 12 Sep 2023 17:23:53 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 11/09/2023 14:57, Eli Zaretskii wrote:

So there is also a second recording for
find-directory-files-recursively-2 with read-process-output-max=409600.
It does improve the performance significantly (and reduce the number of
GC pauses). I guess what I'm still not clear on, is whether the number
of GC pauses is fewer because of less consing (the only column that
looks significantly different is the 3rd: VECTOR-CELLS), or because the
process finishes faster due to larger buffers, which itself causes fewer
calls to maybe_gc.

I think the latter.


It might be both.

To try to analyze how large might per-chunk overhead be (CPU and GC-wisecombined), I first implemented the same function in yet another way thatdoesn't use :filter (so that the default filter is used). But stillasynchronously, with parsing happening concurrently to the process:

(defun find-directory-files-recursively-5 (dir regexp &optionalinclude-directories _p follow-symlinks)(cl-assert (null _p) t "find-directory-files-recursively can't acceptarbitrary predicates")

  (with-temp-buffer
    (setq case-fold-search nil)
    (cd dir)
    (let* ((command
            (append
             (list "find" (file-local-name dir))
             (if follow-symlinks
                 '("-L")
               '("!" "(" "-type" "l" "-xtype" "d" ")"))
             (unless (string-empty-p regexp)
               (list "-regex" (concat ".*" regexp ".*")))
             (unless include-directories
               '("!" "-type" "d"))
             '("-print0")
             ))
           (remote (file-remote-p dir))
           (proc
            (if remote
                (let ((proc (apply #'start-file-process
                                   "find" (current-buffer) command)))
                  (set-process-sentinel proc (lambda (_proc _state)))
                  (set-process-query-on-exit-flag proc nil)
                  proc)
              (make-process :name "find" :buffer (current-buffer)
                            :connection-type 'pipe
                            :noquery t
                            :sentinel (lambda (_proc _state))
                            :command command)))
           start ret)
      (setq start (point-min))
      (while (accept-process-output proc)
        (goto-char start)
        (while (search-forward "\0" nil t)
          (push (buffer-substring-no-properties start (1- (point))) ret)
          (setq start (point))))
      ret)))

This method already improved the performance somewhat (compared tofind-directory-files-recursively-2), but not too much. So I tried thesenext two steps:

- Dropping most of the setup in read_and_dispose_of_process_output(which creates some consing too) and callingFinternal_default_process_filter directly (call_filter_directly.diff),when it is the filter to be used anyway.

- Going around that function entirely, skipping the creation of a Lispstring (CHARS -> TEXT) and inserting into the buffer directly (when thefilter is set to the default, of course). Copied and adapted some codefrom 'call_process' for that (read_and_insert_process_output.diff).

Neither are intended as complete proposals, but here are somecomparisons. Note that either of these patches could only help theimplementations that don't set up process filter (the naive first one,and the new parallel number 5 above).

For testing, I used two different repo checkouts that are large enoughto not finish too quickly: gecko-dev and torvalds-linux.


master

| Function                                         | gecko-dev | linux |
| find-directory-files-recursively                 |      1.69 |  0.41 |
| find-directory-files-recursively-2               |      1.16 |  0.28 |
| find-directory-files-recursively-3               |      0.92 |  0.23 |
| find-directory-files-recursively-5               |      1.07 |  0.26 |
| find-directory-files-recursively (rpom 409600)   |      1.42 |  0.35 |
| find-directory-files-recursively-2 (rpom 409600) |      0.90 |  0.25 |
| find-directory-files-recursively-5 (rpom 409600) |      0.89 |  0.24 |

call_filter_directly.diff (basically, not much difference)

| Function                                         | gecko-dev | linux |
| find-directory-files-recursively                 |      1.64 |  0.38 |
| find-directory-files-recursively-5               |      1.05 |  0.26 |
| find-directory-files-recursively (rpom 409600)   |      1.42 |  0.36 |
| find-directory-files-recursively-5 (rpom 409600) |      0.91 |  0.25 |

read_and_insert_process_output.diff (noticeable differences)

| Function                                         | gecko-dev | linux |
| find-directory-files-recursively                 |      1.30 |  0.34 |
| find-directory-files-recursively-5               |      1.03 |  0.25 |
| find-directory-files-recursively (rpom 409600)   |      1.20 |  0.35 |
| find-directory-files-recursively-5 (rpom 409600) | (!!) 0.72 |  0.21 |

So it seems like we have at least two potential ways to implement anasynchronous file listing routine that is as fast or faster than thesynchronous one (if only thanks to starting the parsing in parallel).

Combining the last patch together with using the very large value ofread-process-output-max seems to yield the most benefit, but I'm notsure if it's appropriate to just raise that value in our code, though.


Thoughts?

[Prev in Thread]

Current Thread

[Next in Thread]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/07
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/08
  - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/09
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/10
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/10
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/11
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/11
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/12
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/12
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov <=
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/12
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/12
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/12
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/12
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/12
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/13
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/13
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/13
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/13
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/13

Prev by Date: bug#56182: 28.1; Display of SVG file with transparent background is incorrect
Next by Date: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Previous by thread: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Next by thread: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Index(es):
- Date
- Thread