bug#64735: 29.0.92; find invocations are ~15x slower because of ignores

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores

From:	Dmitry Gutov
Subject:	bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date:	Fri, 8 Sep 2023 03:53:37 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

Let's try to investigate this some more, if we can.

On 31/07/2023 14:38, Eli Zaretskii wrote:

Date: Sun, 30 Jul 2023 04:35:49 +0300
Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net,
  64735@debbugs.gnu.org
From: Dmitry Gutov <dmitry@gutov.dev>

In this context, junks objects are objects that will need to be
collected by garbage collector very soon because they are just a
byproduct of a function's execution (but aren't used in the return
value, for example). The more of them a function creates, the more work
it will be, supposedly, for the GC.

Heap is perhaps the wrong term (given that C has its own notion of
heap), but I meant the memory managed by the Lisp runtime.

And chunks are the buffered strings that get passed to the process
filter. Chunks of the process' output. By default, these chunks are 4096
characters long, but the comparisons tweak that value by 10x and 100x.


If the subprocess output is inserted into a buffer, its effect on the
GC will be different.  (Not sure if this is relevant to the issue at
hand, as I lost track of the many variants of the function that were
presented.)

Yes, one of the variants inserts into the buffer (one that uses asynchronous process call and also, coincidentally, spends less time inGC), and the asynchronous work from a process filter.

If I read what you wrote superficially, without delving into the
details (which I can't understand), you are saying that the overall
amount of consing is roughly the same.


What is "amount of consing"? Is it just the number of objects? Or does
their size (e.g. string length) affect GC pressure as well?


In general, both, since we have 2 GC thresholds, and GC is actually
done when both are exceeded.  So the effect will also depend on how
much Lisp memory is already allocated in the Emacs process where these
benchmarks are run.


All right.

This is consistent with the
fact that the GC times change only very little.  So I don't think I
see, on this level, what puzzles you in this picture.


Now that you pointed that out, the picture is just more puzzling. While
0.1s in GC is not insignificant (it's 10% of the whole runtime), it does
seem to have been more of a fluke, and on average the fluctuations in GC
time are smaller.

Here's an extended comparison:

(("with-find 4096" . "Elapsed time: 1.737742s (1.019624s in 28 GCs)")
   ("with-find 40960" . "Elapsed time: 1.515376s (0.942906s in 26 GCs)")
   ("with-find 409600" . "Elapsed time: 1.458987s (0.948857s in 26 GCs)")
   ("with-find 1048576" . "Elapsed time: 1.363882s (0.888599s in 24 GCs)")
   ("with-find-p 4096" . "Elapsed time: 1.202522s (0.745758s in 19 GCs)")
   ("with-find-p 40960" . "Elapsed time: 1.005221s (0.640815s in 16 GCs)")
("with-find-p 409600" . "Elapsed time: 0.855483s (0.591208s in 15 GCs)")
("with-find-p 1048576". "Elapsed time: 0.825936s (0.623876s in 16 GCs)")
("with-find-sync 4096" . "Elapsed time: 0.848059s (0.272570s in 7 GCs)")
("with-find-sync 409600"."Elapsed time: 0.912932s (0.339230s in 9 GCs)")
("with-find-sync 1048576"."Elapsed time: 0.880479s (0.303047s in 8 GCs)"
))

What was puzzling for me, overall, is that if we take "with-find 409600"
(the fastest among the asynchronous runs without parallelism) and
"with-find-sync", the difference in GC time (which is repeatable),
0.66s, almost covers all the difference in performance. And as for
"with-find-p 409600", it would come out on top! Which it did in Ihor's
tests when GC was disabled.

But where does the extra GC time come from? Is it from extra consing in
the asynchronous call's case? If it is, it's not from all the chunked
strings, apparently, given that increasing max string's size (and
decreasing their number by 2x-6x, according to my logging) doesn't
affect the reported GC time much.

Could the extra time spent in GC just come from the fact that it's given
more opportunities to run, maybe? call_process stays entirely in C,
whereas make-process, with its asynchronous approach, goes between C and
Lisp even time it receives input. The report above might indicate so:
with-find-p have ~20 garbage collection cycles, whereas with-find-sync -
only ~10. Or could there be some other source of consing, unrelated to
the process output string, and how finely they are sliced?


These questions can only be answered by dumping the values of the 2 GC
thresholds and of consing_until_gc for each GC cycle.  It could be
that we are consing more Lisp memory, or it could be that one of the
implementations provides fewer opportunities for Emacs to call
maybe_gc.  Or it could be some combination of the two.

Do you think the outputs ofhttps://elpa.gnu.org/packages/emacs-gc-stats.html could help?

Otherwise, I suppose I need to add some fprintf's somewhere. Would thebeginning of maybe_gc inside lisp.h be a good place for that?

If we get back to increasing read-process-output-max, which does help
(apparently due to reducing the number we switch between reading from
the process and doing... whatever else), the sweet spot seems to be
1048576, which is my system's maximum value. Anything higher - and the
perf goes back to worse -- I'm guessing something somewhere resets the
value to default? Not sure why it doesn't clip to the maximum allowed,
though.

Anyway, it would be helpful to be able to decide on as high as possible
value without manually reading from /proc/sys/fs/pipe-max-size. And what
of other OSes?


Is this with pipes or with PTYs?


All examples which use make-process call it with :connection-type 'pipe.

The one that calls process-file (the "synchronous" impl) also probablydoes, but I don't see that in the docstring.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov <=
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/08
  - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/09
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/10
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/10
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/11
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/11
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Eli Zaretskii, 2023/09/12
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/12
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/12
    - bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/09/12

Prev by Date: bug#65813: [PATCH] Fix elixir-ts-mode with embedded heex
Next by Date: bug#65802: 29.1; Emacs fails to recognise Hyper under Wayland (KDE)
Previous by thread: bug#62248: 29.0.60; nil obsolete
Next by thread: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Index(es):
- Date
- Thread