coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grep/sed and some strange patterns/inputs


From: Assaf Gordon
Subject: Re: grep/sed and some strange patterns/inputs
Date: Wed, 27 Jul 2016 00:54:56 -0400

Hello,

> On Jul 26, 2016, at 23:09, Christoph Anton Mitterer <address@hidden> wrote:
> 
> I've always had the impression that ^ and $ were the end/begin anchor
> of the current pattern, and since e.g. grep/sed work normally in terms
> of lines the start/end of lines.
[...]
> What I found a bit strange is that e.g.:
> printf '' | sed 's/^/foo/'
> 
> doesn't produce foo and that e.g.
> printf '' | grep '^'
> don't match.
> 
> Why? Or better said, which part of POSIX mandates this? Or is it simply
> "no stdin, nothing happens"?

Exactly!

The command  "printf '' " sends no output, and it is equivalent to redirection 
from /dev/null,
which means sed immediately receives an end-of-file marker and does not try to 
execute any command.

printf with *any* output (with newlines or not) will cause 'sed' and 'grep' to 
read some characters, and then to try to execute commands or match patterns on 
the input.

This can be demonstrated using 'strace' on GNU/Linux machines.
The commands below run 'sed' with both printf and /dev/null, and 'strace' will 
report the 'read' system-call.
The first 'read(3,...)' can be ignored, it is the OS reading a shared library.

The second 'read(0,...)' is the interesting one:
The first "0" indicates reading from STDIN.
sed tries to read upto 4096 bytes from STDIN, and the returned value is zero 
(following the equal sign).
Zero value indicates an end-of-file - meaning there is no input at all,
and sed will not try to execute any commands.
Notice that printf with an empty string and /dev/null result in the same 
behavior: 

  $ strace -e read sed 's/^/foo/' < /dev/null
  read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"..., 
832) = 832
  read(0, "", 4096)                       = 0
  +++ exited with 0 +++

  $ printf '' | strace -e read sed 's/^/foo/'
  read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"..., 
832) = 832
  read(0, "", 4096)                       = 0
  +++ exited with 0 +++


However, if even one character is provided in STDIN,
The read() function will return it, and sed will try to execute the 
pattern/command on the input:

  $ printf 'a' | strace -e read sed 's/^/foo/'
  read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"..., 
832) = 832
  read(0, "a", 4096)                      = 1
  read(0, "", 4096)                       = 0
  read(0, "", 4096)                       = 0
  fooa+++ exited with 0 +


One possible source of confusion is 'echo' vs 'printf': echo by default 
automatically adds a newline.
Thus, the command:
   echo '' | sed 's/^/foo/'
does work as expected because there is some input (one byte: a newline).
Where as this command does not, since there is no input at all:
   printf '' | sed 's/^/foo/'


'grep' follows the same principle, and can be examined using:
   printf '' | strace -e read grep -q '.' && echo match || echo no-match
   strace -e read grep -q '.' </dev/null  && echo match || echo no-match
   printf 'a' | strace -e read grep -q '.' && echo match || echo no-match


Others can perhaps elaborate regarding POSIX standard.
From a cursory look, it seems the wording for 'grep' and 'sed' imply the output 
is tied to having input,
while there is mandatory default output for 'wc' regardless of input ( 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html#tag_20_154_10 
).



Hope this helps,
  - assaf


P.S.
A minor nitpick: coreutils is a separate project from grep or sed.
grep questions should be sent to address@hidden , and
sed questions should be sent to address@hidden .






reply via email to

[Prev in Thread] Current Thread [Next in Thread]