bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores


From: Eli Zaretskii
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Tue, 12 Sep 2023 19:32:41 +0300

> Date: Tue, 12 Sep 2023 17:23:53 +0300
> Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net,
>  64735@debbugs.gnu.org
> From: Dmitry Gutov <dmitry@gutov.dev>
> 
> - Dropping most of the setup in read_and_dispose_of_process_output 
> (which creates some consing too) and calling 
> Finternal_default_process_filter directly (call_filter_directly.diff), 
> when it is the filter to be used anyway.
> 
> - Going around that function entirely, skipping the creation of a Lisp 
> string (CHARS -> TEXT) and inserting into the buffer directly (when the 
> filter is set to the default, of course). Copied and adapted some code 
> from 'call_process' for that (read_and_insert_process_output.diff).
> 
> Neither are intended as complete proposals, but here are some 
> comparisons. Note that either of these patches could only help the 
> implementations that don't set up process filter (the naive first one, 
> and the new parallel number 5 above).
> 
> For testing, I used two different repo checkouts that are large enough 
> to not finish too quickly: gecko-dev and torvalds-linux.
> 
> master
> 
> | Function                                         | gecko-dev | linux |
> | find-directory-files-recursively                 |      1.69 |  0.41 |
> | find-directory-files-recursively-2               |      1.16 |  0.28 |
> | find-directory-files-recursively-3               |      0.92 |  0.23 |
> | find-directory-files-recursively-5               |      1.07 |  0.26 |
> | find-directory-files-recursively (rpom 409600)   |      1.42 |  0.35 |
> | find-directory-files-recursively-2 (rpom 409600) |      0.90 |  0.25 |
> | find-directory-files-recursively-5 (rpom 409600) |      0.89 |  0.24 |
> 
> call_filter_directly.diff (basically, not much difference)
> 
> | Function                                         | gecko-dev | linux |
> | find-directory-files-recursively                 |      1.64 |  0.38 |
> | find-directory-files-recursively-5               |      1.05 |  0.26 |
> | find-directory-files-recursively (rpom 409600)   |      1.42 |  0.36 |
> | find-directory-files-recursively-5 (rpom 409600) |      0.91 |  0.25 |
> 
> read_and_insert_process_output.diff (noticeable differences)
> 
> | Function                                         | gecko-dev | linux |
> | find-directory-files-recursively                 |      1.30 |  0.34 |
> | find-directory-files-recursively-5               |      1.03 |  0.25 |
> | find-directory-files-recursively (rpom 409600)   |      1.20 |  0.35 |
> | find-directory-files-recursively-5 (rpom 409600) | (!!) 0.72 |  0.21 |
> 
> So it seems like we have at least two potential ways to implement an 
> asynchronous file listing routine that is as fast or faster than the 
> synchronous one (if only thanks to starting the parsing in parallel).
> 
> Combining the last patch together with using the very large value of 
> read-process-output-max seems to yield the most benefit, but I'm not 
> sure if it's appropriate to just raise that value in our code, though.
> 
> Thoughts?

I'm not sure what exactly is here to think about.  Removing portions
of read_and_insert_process_output, or bypassing it entirely, is not
going to fly, because AFAIU it basically means we don't decode text,
which can only work with plain ASCII file names, and/or don't move the
markers in the process buffer, which also cannot be avoided.  If you
want to conclude that inserting the process's output into a buffer
without consing Lisp strings is faster (which I'm not sure, see below,
but it could be true), then we could try extending
internal-default-process-filter (or writing a new filter function
similar to it) so that it inserts the stuff into the gap and then uses
decode_coding_gap, which converts inserted bytes in-place -- that, at
least, will be correct and will avoid consing intermediate temporary
strings from the process output, then decoding them, then inserting
them.  Other than that, the -2 and -3 variants are very close
runners-up of -5, so maybe I'm missing something, but I see no reason
be too excited here?  I mean, 0.89 vs 0.92? really?

About inserting into the buffer: what we do is insert into the gap,
and when the gap becomes full, we enlarge it.  Enlarging the gap
involves: (a) enlarging the chunk of memory allocated to buffer text
(which might mean we ask the OS for more memory), and (b) moving the
characters after the gap to the right to free space for inserting more
stuff.  This is pretty fast, but still, with a large pipe buffer and a
lot of output, we do this many times, so it could add up to something
pretty tangible.  It's hard to me to tell whether this is
significantly faster than consing strings and inserting them, only
measurements can tell.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]