|
From: | Dmitry Gutov |
Subject: | bug#64735: 29.0.92; find invocations are ~15x slower because of ignores |
Date: | Mon, 24 Jul 2023 15:55:13 +0300 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 |
On 24/07/2023 14:20, Eli Zaretskii wrote:
Date: Sun, 23 Jul 2023 22:27:26 +0300 Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net, 64735@debbugs.gnu.org From: Dmitry Gutov <dmitry@gutov.dev> On 23/07/2023 20:56, Eli Zaretskii wrote:And, ideally, do all the relevant benchmarking when proposing the change.Of course. Although the benchmarks until now already show quite a variability.Speaking of your MS Windows results that are unflattering to 'find', it might be worth it to do a more varied comparison, to determine the OS-specific bottleneck. Off the top of my head, here are some possibilities: 1. 'find' itself is much slower there. There is room for improvement in the port.I think it's the filesystem, not the port (which I did myself in this case).
But directory-files-recursively goes through the same filesystem, doesn't it?
But I'd welcome similar tests on other Windows systems with other ports of Find. Just remember to measure this particular benchmark, not just Find itself from the shell, as the times are very different (as I reported up-thread).
Concur.
2. The process output handling is worse.Not sure what that means.
Emacs's ability to process the output of a process on the particular platform.
You said: Btw, the Find command with pipe to some other program, like wc, finishes much faster, like 2 to 4 times faster than when it is run from find-directory-files-recursively. That's probably the slowdown due to communications with async subprocesses in action.One thing to try it changing the -with-find implementation to use a synchronous call, to compare (e.g. using 'process-file'). And repeat these tests on GNU/Linux too.
That would help us gauge the viability of using an asynchronous process to get the file listing. But also, if one was just looking into reimplementing directory-files-recursively using 'find' (to create an endpoint with swappable implementations, for example), 'process-file' is a suitable substitute because the original is also currently synchronous.
3. Something particular to the project being used for the test.I don't think I understand this one.
This described the possibility where the disparity between the implementations' runtimes was due to something unusual in the project structure, if you tested different projects between Windows and GNU/Linux, making direct comparison less useful. It's the least likely cause, but still sometimes a possibility.
To look into the possibility #1, you can try running the same command in the terminal with the output to NUL and comparing the runtime to what's reported in the benchmark.Output to the null device is a bad idea, as (AFAIR) Find is clever enough to detect that and do nothing. I run "find | wc" instead, and already reported that it is much faster.
Now I see it, thanks.
I actually remember, from my time on MS Windows about 10 years ago, that some older ports of 'find' and/or 'grep' did have performance problems, but IIRC ezwinports contained the improved versions.The ezwinports is the version I'm using here. But maybe someone came up with a better one: after all, I did my port many years ago (because the native ports available back then were abysmally slow).
We should also look at the exact numbers. If you say that "| wc" invocation is 2-4x faster than what's reported in the benchmark, then it takes about 2-4 seconds. Which is still oddly slower than your reported numbers for directory-files-recursively.
[Prev in Thread] | Current Thread | [Next in Thread] |