bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores


From: Dmitry Gutov
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Mon, 11 Sep 2023 03:02:55 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 10/09/2023 08:33, Eli Zaretskii wrote:
Date: Sun, 10 Sep 2023 04:30:24 +0300
Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net,
  64735@debbugs.gnu.org
From: Dmitry Gutov <dmitry@gutov.dev>

On 08/09/2023 09:35, Eli Zaretskii wrote:
I think you'd need to expose consing_until_gc to Lisp, and then you
can collect the data from Lisp.

I can expose it to Lisp and print all three from post-gc-hook, but the
result just looks like this:

gc-pct 0.1 gc-thr 800000 cugc 4611686018427387903

Perhaps I need to add a hook which runs at the beginning of GC? Or of
maybe_gc even?

You could record its value in a local variable at the entry to
garbage_collect, and the expose that value to Lisp.

That also doesn't seem to give much, given that the condition for entering 'maybe_garbage_collect' is (consing_until_gc < 0). I.e. we wait until it's down to 0, then garbage-collect. What we could perhaps do is add another hook (or a printer) at the beginning of maybe_gc, but either would result in lots and lots of output.

Alternatively, (memory-use-counts) seems to retain some counters which
don't get erased during garbage collection.

Maybe using those will be good enough, indeed.

I added this instrumentation:

(defvar last-mem-counts '(0 0 0 0 0 0 0))

(defun gc-record-after ()
  (let* ((counts (memory-use-counts))
         (diff (cl-map 'list
                       (lambda (old new) (- new old))
                       last-mem-counts counts)))
    (setq last-mem-counts counts)
    (message "counts diff %s" diff)))

(add-hook 'post-gc-hook #'gc-record-after)

so that after each garbage collection we print the differences in all counters (CONSES FLOATS VECTOR-CELLS SYMBOLS STRING-CHARS INTERVALS STRINGS).

And a message call when the process finishes.

And made those recordings during the benchmark runs of two different listing methods (one using make-process, another using process-file) to list all files in a large directory (there are ~200000 files there).

The make-process one I also ran with a different (large) value of read-process-output-max. Results attached.

What's in there? First of all, for find-directory-files-recursively-3, there are 0 garbage collections between the beginning of the function and when we start parsing the output (no GCs while the process is writing to the buffer synchronously). I guess inserting output in a buffer doesn't increase consing, so there's nothing to GC?

Next: for find-directory-files-recursively-2, the process only finishes at the end, when all GC cycles are done for. I suppose that also means we block the process's output while Lisp is running, and also that whatever GC events occur might coincide with the chunks of output coming from the process, and however many of them turn out to be in total.

So there is also a second recording for find-directory-files-recursively-2 with read-process-output-max=409600. It does improve the performance significantly (and reduce the number of GC pauses). I guess what I'm still not clear on, is whether the number of GC pauses is fewer because of less consing (the only column that looks significantly different is the 3rd: VECTOR-CELLS), or because the process finishes faster due to larger buffers, which itself causes fewer calls to maybe_gc.

And, of course, what else could be done to reduce the time spent in GC in the asynchronous case.

Attachment: gcs2.txt
Description: Text document

Attachment: gcs3.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]