coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: stat: added features: `--files0-from=FILE', `--digest-type=WORD' and


From: Stefan Vargyas
Subject: Re: stat: added features: `--files0-from=FILE', `--digest-type=WORD' and `--quoting-style=WORD'
Date: Thu, 22 May 2014 07:05:17 -0700 (PDT)

> Date: Thu, 22 May 2014 12:28:22 +0100
> From: Pádraig_Brady <address@hidden>
> Subject: Re: stat: added features: `--files0-from=FILE', `--digest-type=WORD'

>   join -j2 <(stat -c '%s  %n' /bin/ls /bin/cp | sort) <(sha1sum /bin/cp
> /bin/ls | sort)

>   tr '\n' '\1' |
>   sort |
>   uniq -u ...

Your remarks are correct iff stat and sha1sum output *are* able to produce
consistently joinable outputs. However when attempting to employ such usage
patterns into *generally usable scripts*, one has to take care of possible
inconsistencies (leading to bugs!) occurring when file names contain SPACE,
TAB, NL and other such chars.

A solution would be to impose TAB only as field separator -- thus ensuring that
it cannot appear anywhere else. Then one might invoke join with "-t $'\t'". With
this condition, it should be clearer why the need of '--quoting-style=escape'
and '--digest-type=sha1' options and of '%S' format specifier for stat.

> There is no advantage of supporting this option in stat
> as that is only useful when a command needs to process all
> file names in a _single invocation_, like when sorting or accumulating etc.
> For stat one can efficiently:
> 
>   find ... -print 0 | xargs -r0 stat ...
> 
> or
> 
>   find ... -exec stat {} +

One meaningful reason for single invocation is efficiency. The input to stat
can be huge (and in my initially evoked scenario in fact often is!) -- and
that possible large amount of data propagates downward the multiple pipelines
and fifos of your scenario above.

> Note also that sort has the --zero-terminated option, as do newer versions of
> join and uniq.

The fanciful '-0|--null' options refers to both input and output of sort. The
existing '-z|--zero-terminated' -- only to sort's output.

> This could be useful, however there is already the %N option for quoted file
> name.
> 
> $ stat -c %N /bin/ls
> ‘/bin/ls’
> $ LANG=C src/stat -c %N /bin/ls
> '/bin/ls'

Recall the claimed consistency from above. In case of symlinks, %N produces
output like the one below:

  $ touch /tmp/foo
  $ ln -sv /tmp/foo /tmp/bar
  `/tmp/bar' -> `/tmp/foo'
  $ stat -c %N /tmp/bar
  `/tmp/bar' -> `/tmp/foo'
  $

Also, in case of symlinks, the digest sum computing programs do follow the
links, i.e. they actually compute digests for the content of the file to which
the symlink file points to:

  $ sha1sum /tmp/foo /tmp/bar
  da39a3ee5e6b4b0d3255bfef95601890afd80709  /tmp/foo
  da39a3ee5e6b4b0d3255bfef95601890afd80709  /tmp/bar

The semantics of %S in the proposed patches is different however: the new stat
produces the digest of the *content* of the file itself. In case of symlinks
that content is obtained via 'areadlink_with_size':

  $ stat2 -c '%S  %n' /tmp/foo /tmp/bar
  da39a3ee5e6b4b0d3255bfef95601890afd80709  /tmp/foo
  469150566bd728fc90b4adf6495202fd70ec3537  /tmp/bar

Note that the STAT_* files of my initial usage scenario do have an intrinsic
value of themselves -- not only that of providing the means for verifying the
correctness of making ISO files or of burning DVDs. These files keep a quite
faithful record of content of the file system itself.

With many thanks for your thorough response,

Stefan Vargyas.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]