bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#66020: (bug#64735 spin-off): regarding the default for read-process-


From: Dmitry Gutov
Subject: bug#66020: (bug#64735 spin-off): regarding the default for read-process-output-max
Date: Thu, 21 Sep 2023 17:37:23 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 21/09/2023 10:42, Eli Zaretskii wrote:
Date: Thu, 21 Sep 2023 03:57:43 +0300
Cc: 66020@debbugs.gnu.org
From: Dmitry Gutov <dmitry@gutov.dev>

That leaves the question of what new value to use. 409600 is optimal for
a large-output process but seems too much as default anyway (even if I
have very little experimental proof for that hesitance: any help with
that would be very welcome).

How does the throughput depend on this value?  If the dependence curve
plateaus at some lower value, we could use that lower value as a
"good-enough" default.

Depends on what we're prepared to call a plateau. Strictly speaking, not really. But we have a "sweet spot": for the process in my original benchmark ('find' with lots of output) it seems to be around 1009600. Here's a table (numbers are different from before because they're results of (benchmark 5 ...) divided by 5, meaning GC is amortized:

|    4096 | 0.78 |
|   16368 | 0.69 |
|   40960 | 0.65 |
|  409600 | 0.59 |
| 1009600 | 0.56 |
| 2009600 | 0.64 |
| 4009600 | 0.65 |

The process's output length is 27244567 in this case. Still above the largest of the buffers in this example.

Notably, only allocating the buffer once at the start of the process (experiment mentioned in the email to Stefan M.) doesn't change the dynamics: buffer lengths above ~1009600 make the performance worse.

So there must be some negative factor associated with higher buffers. There is an obvious positive one: the longer the buffer, the longer we don't switch between processes, so that overhead is lower.

We could look into improving that part specifically: for example, reading from the process multiple times into 'chars' right away while there is still pending output present (either looping inside read_process_output, or calling it in a loop in wait_reading_process_output, at least until the process' buffered output is exhausted). That could reduce reactivity, however (can we find out how much is already buffered in advance, and only loop until we exhaust that length?)

I did some more experimenting, though. At a superficial glance,
allocating the 'chars' buffer at the beginning of read_process_output is
problematic because we could instead reuse a buffer for the whole
duration of the process. I tried that (adding a new field to
Lisp_Process and setting it in make_process), although I had to use a
value produced by make_uninit_string: apparently simply storing a char*
field inside a managed structure creates problems for the GC and early
segfaults. Anyway, the result was slightly _slower_ than the status quo.

So I read what 'alloca' does, and it looks hard to beat. But it's only
used (as you of course know) when the value is <= MAX_ALLOCA, which is
currently 16384. Perhaps an optimal default value shouldn't exceed this,
even if it's hard to create a benchmark that shows a difference. With
read-process-output-max set to 16384, my original benchmark gets about
halfway to the optimal number.

Which I think means we should stop worrying about the overhead of
malloc for this purpose, as it is fast enough, at least on GNU/Linux.

Perhaps. If we're not too concerned about memory fragmentation (that's the only explanation I have for the table "session gets older" -- last one -- in a previous email with test-ls-output timings).

And I think we should make the process "remember" the value at its
creation either way (something touched on in bug#38561): in bug#55737 we
added an fcntl call to make the larger values take effect. But this call
is in create_process: so any subsequent increase to a large value of
this var won't have effect.

Why would the variable change after create_process?  I'm afraid I
don't understand what issue you are trying to deal with here.

Well, what could we lose by saving the value of read-process-output-max in create_process? Currently I suppose one could vary its value while a process is still running, to implement some adaptive behavior or whatnot. But that's already semi-broken because fcntl is called in create_process.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]