bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: limiting the directory to locate and search by file type


From: James Youngman
Subject: Re: limiting the directory to locate and search by file type
Date: Fri, 11 Nov 2011 18:42:55 +0000

On Tue, Nov 8, 2011 at 12:25 PM, Peng Yu <address@hidden> wrote:
> On Tue, Nov 8, 2011 at 3:40 AM, James Youngman <address@hidden> wrote:
>> On Tue, Nov 8, 2011 at 12:11 AM, Peng Yu <address@hidden> wrote:
>>> On Mon, Nov 7, 2011 at 5:47 PM, James Youngman <address@hidden> wrote:
>>>> You can achieve this by using locate --regex.
>>>
>>> Suppose I want to restrict the search to /tmp, what regex I should specify?
>>
>> locate --regex '^/tmp\($\|/\)'
>
> So if I want to search for some file pattern, I have to combine the
> regex for the file pattern with the directory regex into a single
> regex. What seems missing in locate is the operator like -and -or that
> are found in find.

Yes.   This is basically because the output of locate is text.  There
are already a lot of tools for performing pattern matching on text.

>  Since locate can be thought of a faster version
> find. I'd think the command line interface for 'locate' should be as
> close to 'find' as possible
>
>> I'm not working on it right now.   But I think that it's a good idea
>> to move in this direction (and caching directory-level information
>> will also probably allow us to speed up indexing).  Would you like to
>> work on this?
>
> I looked at the source code of updatedb. It essentially use find to
> generated the database and use frcode to compress the database, where
> the directory info should be add to the database? frcode is natural to
> compress strings with common suffix, is it also applicable when there
> is additional directory information?

That's not going to quite solve the problem I had in mind, which is
also to make the updating process more efficient.   If we record some
of the stat information for each directory (for example, mtime, ctime,
size, generation number where appliccable) we can deduce whether
entries have been added or removed.   This allows updatedb to simply
stat a number of directories in order to figure out how to update its
database.

Unfortunately, supporting that level of sophistication would be quite
a bit of work.   Among the benefits of doing it though aside from the
speed boost, are that the resulting database would be better for
solving the problem you mention, and the rewrite would allow for
better support for updatedb arguments containing spaces.

James.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]