[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
From: |
Spencer Baugh |
Subject: |
bug#64735: 29.0.92; find invocations are ~15x slower because of ignores |
Date: |
Wed, 19 Jul 2023 17:16:31 -0400 |
Several important commands and functions invoke find; for example rgrep
and project-find-regexp.
Most of these add some set of ignores to the find command, pulling from
grep-find-ignored-files in the former case. So the find command looks
like:
find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* [...more ignores...] \)
-prune -o -type f -print0
Alas, on my system, using GNU find, these ignores slow down find by
about 15x on a large directory tree, taking it from around .5 seconds to
7.8 seconds.
This is very noticeable overhead; removing the ignores makes rgrep and
other find-invoking commands substantially faster for me.
The overhead is linear in the number of ignores - that is, each
additional ignore adds a small fixed cost. This suggests that find is
linearly scanning the list of ignores and checking each one, rather than
optimizing them to a single regexp and checking that regexp.
Obviously, GNU find should be optimizing this. However they have
previously said they will not optimize this; I commented on this bug
https://savannah.gnu.org/bugs/index.php?58197 to request they rethink
that. Hopefully as a fellow GNU project they will be interested in
helping us...
In Emacs alone, there are a few things we could do:
- we could mitigate the find bug by optimizing the regexp before we pass
it to find; this should basically remove all the overhead but makes the
find command uglier and harder to edit
- we could remove rare and likely irrelevant things from
completion-ignored-extensions and vc-ignore-dir-regexp (which are used
to build these lists of ignores)
- we could use our own recursive directory-tree walking implementation
(directory-files-recursively), if we found a nice way to pipe its output
directly to grep etc without going through Lisp. (This could be nice
for project-files, at least)
Incidentally, I tried a find alternative, "bfs",
https://github.com/tavianator/bfs and it doesn't optimize this either,
sadly, so it also has the 15x slowdown.
In GNU Emacs 29.0.92 (build 5, x86_64-pc-linux-gnu, X toolkit, cairo
version 1.15.12, Xaw scroll bars) of 2023-07-10 built on
Repository revision: dd15432ffacbeff0291381c0109f5b1245060b1d
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.8 (Green Obsidian)
Configured using:
'configure --config-cache --with-x-toolkit=lucid
--with-gif=ifavailable'
Configured features:
CAIRO DBUS FREETYPE GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LIBSELINUX LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND
SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XINPUT2 XPM LUCID
ZLIB
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Shell
Memory information:
((conses 16 1939322 193013)
(symbols 48 76940 49)
(strings 32 337371 45355)
(string-bytes 1 12322013)
(vectors 16 148305)
(vector-slots 8 3180429 187121)
(floats 8 889 751)
(intervals 56 152845 1238)
(buffers 976 235)
(heap 1024 978725 465480))
- bug#64735: 29.0.92; find invocations are ~15x slower because of ignores,
Spencer Baugh <=
bug#64735: 29.0.92; find invocations are ~15x slower because of ignores, Dmitry Gutov, 2023/07/20