bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Filename Expansion: Find Utility Pattern Matching vs Bash Shell Patt


From: Stephane Chazelas
Subject: Re: Filename Expansion: Find Utility Pattern Matching vs Bash Shell Pattern Matching
Date: Wed, 17 Jun 2015 08:15:15 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

2015-06-16 12:02:33 -0700, Michael Convey:
> For filename expansion, the 'find' utility's '-name' option seems to
> function similarly, but not exactly the same as the bash shell's builtin
> pattern matching.
> 
> Here are the relevant sections of the GNU reference manual:
> 
>    - Bash shell pattern matching:
>    http://www.gnu.org/software/bash/manual/bashref.html#Pattern-Matching
>    - 'find' utility pattern matching:
>    
> http://www.gnu.org/software/findutils/manual/html_mono/find.html#Shell-Pattern-Matching
> 
> This is very confusing on its own. To add to this confusion, section 2.1.4
> of 'find' utility's man page (referenced above) is entitled "Shell Pattern
> Matching", which implies that 'find' is using the shell's builtin pattern
> matching functionality. However, this does not appear to be the case
> because according to the 'find' man page (http://goo.gl/ngQTKx), under
> '-name pattern', it says the following:
> 
> "The filename matching is performed with the use of the fnmatch(3) library
> function. Don't forget to enclose the pattern in quotes in order to protect
> it from expansion by the shell."
> 
> From this, it sounds like it is not the shell that is performing the
> pattern matching, but the find utility using the fnmatch library.
> 
> Here are my questions:
> 
>    1. Is the bash shell's default filename expansion and pattern matching
>    (extglob shell option disabled) different from that of the find utility
>    using the -name option?
>    2. If so, what are those differences?
>    3. Does bash also use the fnmatch library or some other mechanism for
>    filename expansion and pattern matching?
[...]

FYI, that question was also cross-posted to
http://unix.stackexchange.com/questions/210036/filename-expansion-find-utility-pattern-matching-vs-bash-shell-pattern-matching
where I answered:



In the shell, you need to distinguish filename generation/expansion (aka
globbing): a pattern that expands to a list of files from pattern matching.
globbing uses pattern matching internally, but it's really before all an
operator to generate a list of files based on a pattern.

*/*.txt is a pattern which matches a sequence of 0 or more characters followed
by / followed by a sequence of zero or more characters followed by .txt. When
used as a shell pattern as in:

case $file in
  */*.txt) echo match
esac

It will match on file=.foo/bar/baz.txt.

However */*.txt as a glob is something related but more complex.

In expanding */*.txt into a list of files, the shell will open the current
directory, list its content, find the non-hidden files of type directory (or
symlink to directory) that match *, sort that list, open each of those, list
their content and find the non-hidden ones that match *.txt.

It will never expand .foo/bar/bar.txt even though that matches the pattern
because that's not how it works. On the other hand, the file paths generated by
a glob will all match that pattern.

Similarly, a glob like foo[a/b]baz* will find all the file whose name starts
with b]baz in the foo[a directory.

So, we've seen alread that for globbing, but not for pattern matching, / is
special (globs are somehow split on / and each part treated separately) and
dot-files are treated specially.

Shell globbing and pattern matching are part of the shell syntax. It's
intertwined with quoting and other forms of expansion.

$ bash -c 'case "]" in [x"]"]) echo true; esac'
true

Quoting that ] removes its special meaning (of closing the previous [):

It can even quite confused when you mix everything:

$ ls
*  \*  \a  x

$ p='\*' ksh -xc 'ls $p'
+ ls '\*' '\a'
\*  \a

OK \* is all the files starting with \.

$ p='\*' bash -xc 'ls $p'
+ ls '\*'
\*

It's not all the files starting with \. So, somehow, \ must have escaped the *,
but then again it's not matching * either...

For find, it's a lot simpler. find descends the directory tree at each of the
file argument it receives and then do the tests as instructed for each
encountered file.

For -type f, that's true if the file is a regular file, false otherwise for
-name <some-pattern>, that's true if the name of the currently considered file
matches the pattern, false otherwise. There's no concept of hidden file or /
handling or shell quoting here, that's just matching a string (the name of the
file) against a pattern.

So for instance, -name '*foo[a/b]ar' (which passes -name and *foo[a/b]ar
arguments to find) will match foobar and .fooaar. It will never match foo/bar,
but that's because -name matches on the file name, it would with -path instead.

Now, there is one form of quoting/escaping -- for find -- recognised here, and
that's only with backslash. That allows to escape operators. For the shell,
it's done as part of the usual shell quoting (\ is one of the shell's quoting
mechanisms). For find (fnmatch()), that's part of the pattern syntax.

For instance -name '\**' would match on files whose name starts with *. -name
'*[\^x]*' would match on files whose name contains ^ or x...

Now, as for the different operators recognised by find, fnmatch(), bash and
various other shells, they should all agree at least on a common subset: *, ?
and [...].

Whether a particular shell or find implementation uses the system's fnmatch()
function or their own is up to the implementation. GNU find does at least on
GNU systems. Shells are very unlikely to use them as it would thing complicated
for them and not worth the effort.

bash certainly doesn't. Moddern shells like ksh, bash, zsh also have extensions
over *, ?, [...] and a number of options and special paramters
(GLOBIGNORE/FIGNORE) to affect their globbing behaviour.

Also note that beside fnmatch() which implements shell pattern matching,
there's also the glob() function that implements something similar to shell
globbing.

Now, their can be subtle differences between the pattern matching operators in
those various implementations.

For instance, for GNU fnmatch(), ?, * or [!x] would not match a byte or
sequence of bytes that don't form a valid characters while bash (and most other
shells) would. For instance, on a GNU system find . -name '*' may fail to match
files whose name contains invalid characters, while bash -c 'echo *' will list
them (as long as they don't start with .).

We've mentionned already the confusion that can be incurred by quoting.

-- 
Stephane




reply via email to

[Prev in Thread] Current Thread [Next in Thread]