bug#64735: 29.0.92; find invocations are ~15x slower because of ignores

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores

From:	Dmitry Gutov
Subject:	bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date:	Sat, 29 Jul 2023 03:12:34 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 27/07/2023 16:30, Dmitry Gutov wrote:

I can imagine that the filter-based approach necessarily creates morestrings (to pass to the filter function). Maybe we could increase thosestrings' size (thus reducing the number) by increasing the read buffersize?

To go further along this route, first of all, I verified that the inputstrings are (almost) all the same length: 4096. And they are parsed intostrings with length 50-100 characters, meaning the number of "junk"objects due to the process-filter approach probably shouldn't matter toomuch, given that the number of strings returned is 40-80x more.

But then I ran these tests with different values ofread-process-output-max, which exactly increased those strings' size,proportionally reducing their number. The results were:


> (my-bench-rpom 1 default-directory "")

=>

(("with-find-p 4096" . "Elapsed time: 0.945478s (0.474680s in 6 GCs)")
 ("with-find-p 40960" . "Elapsed time: 0.760727s (0.395379s in 5 GCs)")
("with-find-p 409600" . "Elapsed time: 0.729757s (0.394881s in 5 GCs)"))

where

(defun my-bench-rpom (count path regexp)
  (setq path (expand-file-name path))
  (list
   (cons "with-find-p 4096"
         (let ((read-process-output-max 4096))

(benchmark count (list 'find-directory-files-recursively-2path regexp))))

   (cons "with-find-p 40960"
         (let ((read-process-output-max 40960))

(benchmark count (list 'find-directory-files-recursively-2path regexp))))

   (cons "with-find-p 409600"
         (let ((read-process-output-max 409600))

(benchmark count (list 'find-directory-files-recursively-2path regexp))))))

...with the last iteration showing consistently the same or betterperformance than the "sync" version I benchmarked previously.

What does that mean for us? The number of strings in the heap isreduced, but not by much (again, the result is a list with 43x moreelements). The combined memory taken up by these intermediate strings tobe garbage-collected, is the same.

It seems like per-chunk overhead is non-trivial, and affects GC somehow(but not in a way that just any string would).

In this test, by default, the output produces ~6000 strings and passesthem to the filter function. Meaning, read_and_dispose_of_process_outputis called about 6000 times, producing the overhead of roughly 0.2s.Something in there must be producing extra work for the GC.


This line seems suspect:

       list3 (outstream, make_lisp_proc (p), text),

Creates 3 conses and one Lisp object (tagged pointer). But maybe I'mmissing something bigger.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, (continued)
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Richard Stallman, 2023/07/20

Prev by Date: bug#64912: 28.2; insert-kbd-macro creates a unicode character for Alt prefixed keysequences
Next by Date: bug#64923: 29.1; white background glitch with new graphical frames
Previous by thread: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Next by thread: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Index(es):
- Date
- Thread