bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: win32 recursive glob broken


From: Bob Proulx
Subject: Re: win32 recursive glob broken
Date: Sat, 24 Jan 2004 12:15:51 -0700
User-agent: Mutt/1.3.28i

John Calcote wrote:
> Thanks so much for the response. 

Happy to help.  But please keep the mailing list in the CC.  I am but
one person and although I may have a response I don't always have the
right answer.  It was my fault that I did not included a
Mail-Followup-To: field in my first response.  Sorry about that.

> It sounds like what you're saying is that the -r option actually
> changes the target file set to a target directory set. Is this correct?

No.  It says that if and only if an argument is a directory then
instead of printing an error message recurse down into the directory
and repeat the operation on all files in that directory.  This is
recursively applied to all directories below the argument that
matched.  If the target is not a directory then the grep proceeds
normally the same as if the option had not been given.

> If so, it seems counter to the traditional meaning of a recursive option
> on  a command line utility.

Traditional for the UNIX operating system where grep originated?  Or
traditional command behavior for the DOS platform?  (DOS is a platform
without a doubt.  But I hesitate to it an operating system.)

> Now I understand that most Unix utilities simply don't have a
> recursive option (at least traditionally), but rather, they rely on
> find and xargs to provide that functionality for them,

Correct.  The philosophy is to be modular and to link components
together to form a greater whole.  Each command should do one thing
and do it well.  Use pipes to link commands and a programmable shell
language to control them.  This is software reuse on a large scale.

> but when you do add a recursive option to a utility, you expect it
> do the obvious. For example:
>  
>    grep include *.c
>  
> I've always thought that the meaning of this line was to search for all
> instances of the word 'include' in all files that match the filespec
> '*.c' in the current working directory.

Please test your command above using the 'echo' command.  Here is an
example.  Please tell me how a command can tell the difference between
using the file glob wildcard (*.c) and not using by explicitly listing
out the files?

Set up the test case:

  mkdir /tmp/t
  cd /tmp/t
  touch a.c b.c c.c d.c

How are these two following commands different?

  echo grep hello a.c b.c c.c d.c
  echo grep hello *.c

The answer is that they are not.  The command can't tell that you used
a wildcard.  Commands don't expand wildcards.  The shell does that.

> By adding a -r command line option:
>  
>    grep -r include *.c

Try this with 'echo'.  What do you see?  Are any of those .c files
directories?

  echo grep -r include *.c

> I would expect the command to search all files matching the '*.c'
> filespec in the cwd, AS WELL AS all files matching '*.c' in the the
> entire directory hierarchy beneath the cwd. This seems obvious to me.

That is not obvious to me and that is not what I would expect.  I
expect the file glob *.c to be expanded by the shell into a list of
literal file names and handed to the command as arguments.  This
provides a uniform wildcard interface for all commands.  Commands do
not need to include wildcard processing code because the shell does it
for all commands uniformly.  But I expect this is because I "grew up"
on a UNIX machine and this is how shells and commands have behave
there.

I accept that you must have "grown up" on a DOS machine and expect
programs to behave like the old CP/M console command processor it was
modeled after.  There the command itself would see the literal "*.c"
characters and parse the wildcards itself, reading the directory,
processing file pattern matches, etc.  One advantage there is that
commads like "mv *.c *.cc" can work because the program can see the
wildcards and know exactly the keys the user typed in.  That is fine
for DOS machines and I would probably expect that if I were on a DOS
machine.  But you are asking about GNU grep which is not a DOS
command.  GNU grep is a GNU command.  Shouldn't it behave as on a GNU
system?

If I were porting a UNIX grep or GNU grep to windows it would be for
one of two reasons.  Depending upon the reason I would do things
completely differently.

If I were porting to DOS because I liked the DOS platform (after all,
why else would someone run there) then I would do my best to make it
behave like other DOS commands.  I would change the options to be
/options and I would make the behavior like other DOS commands.  This
would break all compatibility with traditional UNIX scripts and
behavior of course.  But it would make it compatible with DOS scripts
and behavior.  It is a choice.  But then on DOS I am not sure I would
be porting grep to DOS but using the DOS 'find' command instead.

If I personally were porting grep to DOS it would be to get the
UNIX/GNU environment, or as close as I could get to it, on a DOS
machine.  In which case I would preserve the UNIX environment as much
as possible and keep the options the same and preserve standards
conformance so that scripts would have a chance at running unchanged
there and so on.  This is the more typical case and why porters
usually keep the options UNIX/GNU like and not DOS like.  They are
trying their best to keep the UNIX/GNU environment intact.

> I'm not saying I know how it should be, but with my limited experience
> with other versions of grep on other platforms, this seems obvious to
> me.

Which other platform?  OS/2?  The Mac?  Beos?  Perhaps those other
versions of grep you mentioned, I don't know the details of what you
are suggesting there, have changed the behavior to make it more
native?  That is fine.  But surely you are not talking about grep on a
UNIX or GNU platform.  GNU, BSD, HP-UX, Solaris, IBM AIX, etc. all
behave the same in this regard.

> What standard Unix concept am I missing here?

On UNIX/GNU systems the shell processes wildcards and meta characters
and does most of the argument handling for all programs uniformly.
The program gets the expanded argument list and then performs its own
operations.  Small and modular programs are joined together using
pipes to create more complex programs.  On DOS programs that do
everything are common because the platform under it does so little
that there is no choice.

Single large monolithic programs, such as a program which does
everything, are frowned upon because they violate those principles of
keep it small, keep it focused on doing one thing well, and design it
to fit with other programs so that more comlex behavior can be
created.  There are many classic examples which violate those
principles.  They are all usually troubled and become examples of bad
design.

Really this design tenet is an interaction between the command shell
and the programs it runs.  The operating system kernel does not
mandate this.  Anyone is free to write a different shell which did not
expand wildcards and did not do any argument processing.  They could
also write a new set of coreutils (cp, mv, rm, etc., please name them
uniquely) which would do argument processing themselves.  This set of
shell and commands could be rolled out together and a different
paradigm used by those who used that shell and those commands.  That
would be fine.  But I would not prefer it myself.

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]