bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores


From: Spencer Baugh
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Wed, 19 Jul 2023 17:16:31 -0400

Several important commands and functions invoke find; for example rgrep
and project-find-regexp.

Most of these add some set of ignores to the find command, pulling from
grep-find-ignored-files in the former case.  So the find command looks
like:

find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* [...more ignores...] \)
-prune -o -type f -print0

Alas, on my system, using GNU find, these ignores slow down find by
about 15x on a large directory tree, taking it from around .5 seconds to
7.8 seconds.

This is very noticeable overhead; removing the ignores makes rgrep and
other find-invoking commands substantially faster for me.

The overhead is linear in the number of ignores - that is, each
additional ignore adds a small fixed cost.  This suggests that find is
linearly scanning the list of ignores and checking each one, rather than
optimizing them to a single regexp and checking that regexp.

Obviously, GNU find should be optimizing this.  However they have
previously said they will not optimize this; I commented on this bug
https://savannah.gnu.org/bugs/index.php?58197 to request they rethink
that.  Hopefully as a fellow GNU project they will be interested in
helping us...

In Emacs alone, there are a few things we could do:
- we could mitigate the find bug by optimizing the regexp before we pass
it to find; this should basically remove all the overhead but makes the
find command uglier and harder to edit
- we could remove rare and likely irrelevant things from
completion-ignored-extensions and vc-ignore-dir-regexp (which are used
to build these lists of ignores)
- we could use our own recursive directory-tree walking implementation
(directory-files-recursively), if we found a nice way to pipe its output
directly to grep etc without going through Lisp.  (This could be nice
for project-files, at least)

Incidentally, I tried a find alternative, "bfs",
https://github.com/tavianator/bfs and it doesn't optimize this either,
sadly, so it also has the 15x slowdown.



In GNU Emacs 29.0.92 (build 5, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2023-07-10 built on

Repository revision: dd15432ffacbeff0291381c0109f5b1245060b1d
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.8 (Green Obsidian)

Configured using:
 'configure --config-cache --with-x-toolkit=lucid
 --with-gif=ifavailable'

Configured features:
CAIRO DBUS FREETYPE GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LIBSELINUX LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND
SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XINPUT2 XPM LUCID
ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Shell

Memory information:
((conses 16 1939322 193013)
 (symbols 48 76940 49)
 (strings 32 337371 45355)
 (string-bytes 1 12322013)
 (vectors 16 148305)
 (vector-slots 8 3180429 187121)
 (floats 8 889 751)
 (intervals 56 152845 1238)
 (buffers 976 235)
 (heap 1024 978725 465480))





reply via email to

[Prev in Thread] Current Thread [Next in Thread]