bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gawk-stable] bug: fatal error when getline from directory


From: Paolo
Subject: Re: [gawk-stable] bug: fatal error when getline from directory
Date: Sun, 4 Jan 2009 11:11:01 +0100
User-agent: Mutt/1.3.28i

On Sat, Jan 03, 2009 at 10:14:25PM -0700, Eric Blake wrote:
> > Different awks do different things when handed a directory. Brian
> > Kernighan's awk treats a read of a directory as EOF; I think that's wrong.
> 
> POSIX 2008 states that awk is only required to operate on text files, and
> a directory does not qualify as a text file:
> http://www.opengroup.org/onlinepubs/9699919799/utilities/awk.html
> 
> Therefore, you can define gawk to do whatever you would like, as a
> compliant client will never ask gawk to read a directory.

I disagree, the definition of 'text file' is rather broad/vague [1], and 
actually gawk seems to conform to IEEE Std 1003.1-2001, so that a 'line of
characters' is a sequence of whatever char except the line-separator - '\n'
by default but can be any, '\0' included - eg:

$ echo -e -n '1\n1\x002\x003\x004\x00'| gawk 'BEGIN{RS="\0"}{ print "-"$0"-"}'
-1
1-
-2-
-3-
-4-

*awk operates on stdin as well, whose type is undefined. gawk's getline can
also open '/inet/' special files which generally are 'binary file'.
The example above also shows that Aharon's point on filename with '\n' isn't
specific to embedding readdir(3) 'mode', because you might have to deal 
with such issue in any case, eg:

$ a=`echo -e '/tmp/1\n1'`
$ touch "$a"
$ ls -1 /tmp
1?1
...
$ gawk 'BEGIN{while ("ls /tmp"|getline) print "-"$0"-"}' 
-1-
-1-
...

I think you *cannot* define gawk to do whatever you would like when getline
is given a dir, for predictability and consistency: if we don't allow for a
dir to be treated like a line-oriented file, then getline should return -1
like for any other error condition, since script author might need to catch
and handle such case *from within* the script (I, for one, do expect this).

Double-thinking of it though, I'd rather have getline always return -1 on 
a dir, and eventually implment readdir(3) like a special mode, eg '/dir/' 
the same as for other special fd, so that if you expect 'a' to be a dir
you'd say 'getline f < "/dir/tmp/a"' which of course would return -1 if 'a'
is not a dir, and set ERRNO accordingly.


-- 
paolo

[1] http://www.opengroup.org/onlinepubs/009695399/
" 3.392 Text File

A file that contains characters organized into one or more lines. The lines do 
not contain NUL characters and none can exceed {LINE_MAX} bytes in length, 
including the <newline>. Although IEEE Std 1003.1-2001 does not distinguish 
between text files and binary files (see the ISO C standard), many utilities 
only produce predictable or meaningful output when operating on text files. The 
standard utilities that have such restrictions always specify "text files" in 
their STDIN or INPUT FILES sections."


> 
> I recently changed GNU m4 to outright reject all attempts to open
> directories as input files, printing an error message and exiting with
> non-zero status after processing all other input files.  I just don't see
> any other choice that is both sane and portable in how to handle directory
> reads in any way that you could easily document, considering that systems
> vary in whether you are even allowed to read from a file descriptor
> visiting a directory.
> 
> http://lists.gnu.org/archive/html/m4-patches/2008-09/msg00004.html
> http://git.savannah.gnu.org/gitweb/?p=m4.git;a=commitdiff;h=4a5040d
> 
> - --
> Don't work too hard, make some time for fun as well!
> 
> Eric Blake             address@hidden
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iEYEARECAAYFAklgRbEACgkQ84KuGfSFAYDvNwCfVYeTtGKd6tzkJBYiUIQvk+dD
> oDEAnjCPaedSYTEx5enTBMdt2sLvFmKf
> =6e5v
> -----END PGP SIGNATURE-----
> 
> 

-- 
 paolo
 
 GPG/PGP id:0x3A47DE45  - B5F9 AAA0 44BD 2B63 81E0  971F C6C0 0B87 3A47 DE45
 - 9/11: the outrageous deception & coverup: http://journalof911studies.com -




reply via email to

[Prev in Thread] Current Thread [Next in Thread]