bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep.... I know I am new to ubuntu but....


From: Bob Proulx
Subject: Re: grep.... I know I am new to ubuntu but....
Date: Wed, 8 Feb 2012 23:50:27 -0700
User-agent: Mutt/1.5.21 (2010-09-15)

DAVE HITCHMAN wrote:
> So the 'search in sub directories' flag doesn't work? Why does the
> shell try and fail to do the right thing? If it is going to
> 'process' one argument why does it not process them all?

Of course it works fine.  You are simply not understanding how it
works.  :-)

The command shell passes across the command line first and expands all
file wildcards.  A file wildcard is something like *.c where the shell
will try to match it against any files.  If there is a match then it
is replaced with those matched files.

  $ touch one.c two.c
  $ echo *.c
  one.c two.c

In this way you can grep through those files look for strings in them.

  $ echo xyzzy > one.c
  $ grep xyzzy *.c
  one.c:xyzzy

But again *.c matches the files it matches and no others.  If no files
match then it won't be modified.

  $ echo *.none
  *.none

Since there were no files matching *.none the file glob was not
expanded and was passed through directly.  But in all other cases the
file glob is expanded by the shell and the program never sees it.  The
program will only see the file names that the shell provided.

To grep through files in a subdirectory you would need to provide a
wildcard for those files.

  $ grep xyzzy */*.c

The grep -r option tells grep to recurse down any directories.  But it
would be very unusual to ever find a directory named something.c and
matching *.c.  So using grep -r with a file wildcard pattern such as
*.c can't work.

  $ grep -r xyzzy *.c  # <-- Almost always wrong.

Grep will recurse down directories but they must be an option
argument.  It is typical to use '.' for the current directory there.
(In very, very recent versions, just last week as I recall, grep has
been updated to assume dot if none is specified.  But it will be a
couple of years before that version has propagated.)

  $ grep -r xyzzy .

Some shells such as zsh and I am sure others interpret two stars
together to mean that the wildcards should be expanded down the
directory tree.  I don't have it available but I think from memory
something like this:  'grep xyzzy **/*.c'

The "Unix" spirit would be to find the files you want and grep through
them on the command line.

  $ grep xyzzy $(find . -name '*.c')

I said in Unix spirit.  It isn't usually done that way.  Although it
works very well.  Being able to undertand it is a fundamental building
block to being able to use the software tools effectively.

But traditionally that can overflow a kernel's maximum argument space.
It would provide the error "Argument list too long" if the number of
files were larger than the available kernel buffer space.  That limit
has been removed from recent Linux kernels, now it is effectively
available memory, but still exists in most other traditional Unix
kernel.  Therefore it is normal to use find for that purpose.  The
POSIX standard portable way would be like this which is maximally
efficient.  It launches the minimum number of grep processes with the
maximum number of arguments.  It handles filenames with whitepspace in
them.

  $ find . -name '*.c' -exec grep xyzzy {} +

> John Cowan wrote:
>         find -name '*.c' | xargs grep 'mystring'

That way was the really very traditional way to do it.  That is the
way we always did it "back in the day".  But as you noted it doesn't
handle whitespace.  I never put whitespace in filenames but some
people do and then it breaks.

>         find -name '*.c' -print0 | xargs -0 grep 'mystring'

That handles whitespace just fine and was more efficient than find's
"{} \;" construct.  But it isn't preferred now since the introduction
of the find "{} +" construct.  The introduction of "{} +" caused the
use of find|xargs to be obsolete almost overnight.  Now instead of
using xargs I generally recommended to use find with "{} +" instead
since that way is POSIX standard.

Of course everything you said was absolutely correct.  I hated to rain
on it, and am trying not to, but wanted to get the information about
"{} +" out.  Try it for 30 days and you will never go back! :-)

> > Even trying the various bizarre combinations of finds and execs that
> > I have seen on the web I can't seem to do this very simple task. If
> > Microsoft can have managed to provide find in files and various other
> > similar things for over 20 years isn't it time I could do the same on
> > a linux system without the need to spend 10 hours searching endless
> > uninformative websites all lying about what works?
> 
> "If the blind lead the blind, both shall fall into a pit." (Matt. 15:13)

Love it!  :-)

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]