findutils-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Findutils-patches] new predicate


From: Eric Blake
Subject: Re: [Findutils-patches] new predicate
Date: Thu, 27 May 2010 15:12:12 -0600
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-3.fc13 Lightning/1.0b1 Mnenhy/0.8.2 Thunderbird/3.0.4

On 05/27/2010 02:04 PM, Konrad Eisele wrote:
> I wanted to submit a patch that is quite short and 
> more thought as a feature request. It adds the predicate
> "-dtype <regex>" (dtype meaning datatype). The dtype
> predicate uses libmagic from the "file" command to get
> the *content datatype* of the file in view, then doing a regex on
> it. i.e. "echo abc>f.txt; file f.txt" yealds "ASSCII text".
> Therefore "file f.txt -dtype .*text.*" would do a regex ".*text.*"
> on "ASCII text" (and match). 

Personally, I'm a bit reluctant to add this patch, because you can
achieve the same effect with more efficient use of existing predicates:

> 
> The problem this patch addresses is like this:
> I have several source project directory with serveral million
> files in them. I want to make a backup, however i want 
> to only backup text files, (Makefiles, shell sripts, c and
> h files etc). Currently I do something like this:
> (for f in `find <srcdir> -type f`; do if (file $f | cut -d: -f2 | grep text 
> &> /dev/null ); then echo $f; fi; done) > file.list

find <srcdir> -type f -exec sh -c \
  'file "$@" | sed -n "s/:.*text.*//p"' sh {} + > file.list

Remember, the reason your version was so slow is that it was spawning a
subshell, file, cut, and grep command per file; my version uses exec {}
+ to cram as many files as possible per file(1) invocation, then uses
sed instead of cut|grep for a further reduction in processes.

Meanwhile, be aware that this solution assumes that none of the files
found will contain : or newline; you may want to add some defensive
programming into your find expression to reject file names matching
those patterns.

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]