Date: Mon, 24 Jul 2023 15:55:13 +0300
Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net,
64735@debbugs.gnu.org
From: Dmitry Gutov <dmitry@gutov.dev>
1. 'find' itself is much slower there. There is room for improvement in
the port.
I think it's the filesystem, not the port (which I did myself in this
case).
But directory-files-recursively goes through the same filesystem,
doesn't it?
It does (more or less; see below). But I was not trying to explain
why Find is slower than directory-files-recursively, I was trying to
explain why Find on Windows is slower than Find on GNU/Linux.
If you are asking why directory-files-recursively is so much faster on
Windows than Find, then the main factors I can think about are:
. IPC, at least in how we implement it in Emacs on MS-Windows, via a
separate thread and OS-level events between them to signal that
stuff is available for reading, whereas
directory-files-recursively avoids this overhead completely;
. Find uses Posix APIs: 'stat', 'chdir', 'readdir' -- which on
Windows are emulated by wrappers around native APIs. Moreover,
Find uses 'char *' for file names, so calling native APIs involves
transparent conversion to UTF-16 and back, which is what native
APIs accept and return. By contrast, Emacs on Windows calls the
native APIs directly, and converts to UTF-16 from UTF-8, which is
faster. (This last point also means that using Find on Windows
has another grave disadvantage: it cannot fully support non-ASCII
file names, only those that can be encoded by the current
single-byte system codepage.)
2. The process output handling is worse.
Not sure what that means.
Emacs's ability to process the output of a process on the particular
platform.
You said:
Btw, the Find command with pipe to some other program, like wc,
finishes much faster, like 2 to 4 times faster than when it is run
from find-directory-files-recursively. That's probably the slowdown
due to communications with async subprocesses in action.
I see this slowdown on GNU/Linux as well.
One thing to try it changing the -with-find implementation to use a
synchronous call, to compare (e.g. using 'process-file'). And repeat
these tests on GNU/Linux too.
This still uses pipes, albeit without the pselect stuff.