bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#64735: 29.0.92; find invocations are ~15x slower because of ignores


From: sbaugh
Subject: bug#64735: 29.0.92; find invocations are ~15x slower because of ignores
Date: Thu, 20 Jul 2023 12:22:19 +0000 (UTC)
User-agent: Gnus/5.13 (Gnus v5.13)

Eli Zaretskii <eliz@gnu.org> writes:
>> From: Spencer Baugh <sbaugh@janestreet.com>
>> Date: Wed, 19 Jul 2023 17:16:31 -0400
>> 
>> 
>> Several important commands and functions invoke find; for example rgrep
>> and project-find-regexp.
>> 
>> Most of these add some set of ignores to the find command, pulling from
>> grep-find-ignored-files in the former case.  So the find command looks
>> like:
>> 
>> find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* [...more ignores...] \)
>> -prune -o -type f -print0
>> 
>> Alas, on my system, using GNU find, these ignores slow down find by
>> about 15x on a large directory tree, taking it from around .5 seconds to
>> 7.8 seconds.
>> 
>> This is very noticeable overhead; removing the ignores makes rgrep and
>> other find-invoking commands substantially faster for me.
>
> grep-find-ignored-files is a customizable user option, so if this
> slowdown bothers you, just customize it to avoid that.

I think the fact that the default behavior is very slow, is bad.

> And if there are patterns there that are no longer pertinent or rare,
> we could remove them from the default value.

Sure!

So the thing to narrow down would be completion-ignored-extensions,
which is what populates grep-find-ignored-files.  Most things in that
list are irrelevant to most users, but all of them are relevant to some
users.

Most of these are language-specific things - e.g. there's a bunch of
Common Lisp compiled object (or something) extensions.

Perhaps we could modularize this, so that individual packages add things
to completion-ignored-extensions at load time.  Then
completion-ignored-extensions would only include things which are
relevant to a given user, as determined by what packages they load.

> I'm not sure we should bother more than these two simple measures.

Unfortunately those two simple measures help rgrep but they don't help
project-find-regexp (and others project.el commands using
project--files-in-directory such as project-find-file), since those
project commands pull their ignores from the version control system
through vc (not grep-find-ignored-files), and then pass them to find.

>> The overhead is linear in the number of ignores - that is, each
>> additional ignore adds a small fixed cost.  This suggests that find is
>> linearly scanning the list of ignores and checking each one, rather than
>> optimizing them to a single regexp and checking that regexp.
>
> If it uses fnmatch, it cannot do it any other way, I think





reply via email to

[Prev in Thread] Current Thread [Next in Thread]