emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why does not rgrep use "grep -r"?


From: David Kastrup
Subject: Re: Why does not rgrep use "grep -r"?
Date: Sat, 03 Nov 2007 10:43:03 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.50 (gnu/linux)

Eli Zaretskii <address@hidden> writes:

>> From: Miles Bader <address@hidden>
>> Date: Sat, 03 Nov 2007 12:37:47 +0900
>> Cc: Andreas Schwab <address@hidden>, Emacs Devel <address@hidden>
>> 
>> > I believed that grep had to be started many times. Is not that the case?
>> 
>> xargs invokes grep in "batches," with as many filenames as will fit on
>> the command line; for e.g. linux, that's many thousands at once, so
>> process invocation overhead will tend to be in the noise compared to
>> file I/O overhead.
>
> Yes, but I believe "grep -r" will be still faster, even on GNU/Linux,
> since all it does to recurse is `readdir' and `fnmatch'; the need for
> writing file names to the pipe and reading them on the xargs side is
> avoided.

Totally warm cache:

address@hidden:/usr/local/texlive/2007$ time find -name \*.tex|xargs grep snort
./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and 
snorting, that I could not hear myself think for

real    0m0.974s
user    0m0.372s
sys     0m0.536s
address@hidden:/usr/local/texlive/2007$ time grep -r  --include=\*.tex snort .
./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and 
snorting, that I could not hear myself think for

real    0m1.225s
user    0m0.376s
sys     0m0.764s

Totally cold cache (after umount and mount):

address@hidden:/usr/local/texlive/2007$ time grep -r  --include=\*.tex snort .
./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and 
snorting, that I could not hear myself think for

real    1m44.387s
user    0m0.508s
sys     0m3.768s


address@hidden:/usr/local/texlive/2007$ time find -name \*.tex|xargs grep snort
./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and 
snorting, that I could not hear myself think for

real    0m59.633s
user    0m0.604s
sys     0m1.484s


And, for good measure:

address@hidden:/usr/local/texlive/2007$ time find -name \*.tex -exec grep snort 
{} \+
./texmf-dist/source/latex/ae/aesample.tex:and whooping and sneezing and 
snorting, that I could not hear myself think for

real    0m55.640s
user    0m0.576s
sys     0m1.292s

In short: interspersing the directory and file search on a per-file
basis (as grep -r does) makes the whole operation much more inefficient
on a cold buffer cache.  On a warm cache, it is pretty much the same.
Using a pipe also allows for some parallelism.  In this particular case,
however, both jobs are so much I/O-bound that the last, pipeless version
using -exec ... \+ is still somewhat faster even though it is strictly
single-threaded in its operation.  The decisive factor appears to be the
large-scale bundling of directory searches without intervening file
searches in between.

This is on a
Linux lola 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC 2007 i686 GNU/Linux

single processor laptop with a fairly standard ATA disk.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum




reply via email to

[Prev in Thread] Current Thread [Next in Thread]