bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores


From: Dmitry Gutov
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Tue, 25 Jul 2023 05:41:13 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 24/07/2023 16:26, Eli Zaretskii wrote:
Date: Mon, 24 Jul 2023 15:55:13 +0300
Cc: luangruo@yahoo.com, sbaugh@janestreet.com, yantar92@posteo.net,
  64735@debbugs.gnu.org
From: Dmitry Gutov <dmitry@gutov.dev>

1. 'find' itself is much slower there. There is room for improvement in
the port.

I think it's the filesystem, not the port (which I did myself in this
case).

But directory-files-recursively goes through the same filesystem,
doesn't it?

It does (more or less; see below).  But I was not trying to explain
why Find is slower than directory-files-recursively, I was trying to
explain why Find on Windows is slower than Find on GNU/Linux.

Understood. But we probably don't need to worry about the differences between platforms as much as about choosing the best option for each platform (or not choosing the worst, at least). So I'm more interested about how the find-based solution is more than 4x slower than the built-in one on MS Windows.

If you are asking why directory-files-recursively is so much faster on
Windows than Find, then the main factors I can think about are:

   . IPC, at least in how we implement it in Emacs on MS-Windows, via a
     separate thread and OS-level events between them to signal that
     stuff is available for reading, whereas
     directory-files-recursively avoids this overhead completely;
   . Find uses Posix APIs: 'stat', 'chdir', 'readdir' -- which on
     Windows are emulated by wrappers around native APIs.  Moreover,
     Find uses 'char *' for file names, so calling native APIs involves
     transparent conversion to UTF-16 and back, which is what native
     APIs accept and return.  By contrast, Emacs on Windows calls the
     native APIs directly, and converts to UTF-16 from UTF-8, which is
     faster.  (This last point also means that using Find on Windows
     has another grave disadvantage: it cannot fully support non-ASCII
     file names, only those that can be encoded by the current
     single-byte system codepage.)

I seem to remember that Wine, which also does a similar dance of translating library and system calls, is often very close to the native performance for many programs. So this could be a problem, but necessarily a significant one.

Although text encoding conversion seems like a prime suspect, if the problem is here.

2. The process output handling is worse.

Not sure what that means.

Emacs's ability to process the output of a process on the particular
platform.

You said:

    Btw, the Find command with pipe to some other program, like wc,
    finishes much faster, like 2 to 4 times faster than when it is run
    from find-directory-files-recursively.  That's probably the slowdown
    due to communications with async subprocesses in action.

I see this slowdown on GNU/Linux as well.

One thing to try it changing the -with-find implementation to use a
synchronous call, to compare (e.g. using 'process-file'). And repeat
these tests on GNU/Linux too.

This still uses pipes, albeit without the pselect stuff.

I'm attaching an extended benchmark, one that includes a "synchronous" implementation as well. Please give it a spin as well.

Here (GNU/Linux) the reported numbers look like this:

> (my-bench 1 default-directory "")

(("built-in" . "Elapsed time: 1.601649s (0.709108s in 22 GCs)")
 ("with-find" . "Elapsed time: 1.792383s (1.135869s in 38 GCs)")
 ("with-find-p" . "Elapsed time: 1.248543s (0.682827s in 20 GCs)")
 ("with-find-sync" . "Elapsed time: 0.922291s (0.343497s in 10 GCs)"))

Attachment: find-bench.el
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]