bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] When RS is null, POSIX states \n should be in FS, gawk on


From: arnold
Subject: Re: [bug-gawk] When RS is null, POSIX states \n should be in FS, gawk only does that if FS is single char
Date: Mon, 15 Apr 2019 07:52:51 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Hi.

Thanks for the report. This smells like a change in the POSIX
standard, but I will have to research previous standards and also
what the original AWK book says.

Thanks,

Arnold

Ed Morton <address@hidden> wrote:

> I just came across this where setting RS to null causes FS to include 
> `\n` if FS is a singe char but not otherwise:
>
>     $ printf '1:2\n3\n' | awk -F':' -v RS= '{for (i=1; i<=NF; i++) print
>     i"/"NF, "<"$i">"}'
>     1/3 <1>
>     2/3 <2>
>     3/3 <3>
>
>     $ printf '1::2\n3\n' | awk -F'::' -v RS= '{for (i=1; i<=NF; i++)
>     print i"/"NF, "<"$i">"}'
>     1/2 <1>
>     2/2 <2
>     3>
>
> with this gawk version:
>
>     $ awk --version
>     GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2)
>     Copyright (C) 1989, 1991-2018 Free Software Foundation.
>
> and that makes sense given the gawk documentation 
> (https://www.gnu.org/software/gawk/manual/gawk.html#Multiple-Line) which 
> says (red/underline mine):
>
>     When RS is set to the empty string _/and /__FS is set to a single
>     character_, the newline character always acts as a field separator.
>     This is in addition to whatever field separations result from FS^
>
> but the POSIX spec (http://pubs.opengroup.org/onlinepubs/9699919799/) says:
>
>     *RS*
>         The first character of the string value of *RS* shall be the
>         input record separator; a <newline> by default. If *RS* contains
>         more than one character, the results are unspecified. If *RS* is
>         null, then records are separated by sequences consisting of a
>         <newline> plus one or more blank lines, leading or trailing
>         blank lines shall not result in empty records at the beginning
>         or end of the input, and a <newline> shall always be a field
>         separator, no matter what the value of *FS* is.
>
> gawk behaves the way I described with or without the `--posix` flag. 
> Shouldn't it add `\n` as a separator when RS is null regardless of the 
> value of FS like POSIX says? FWIW OSX/BSD awk on MacOS behaves the same 
> way that gawk does, idk about other awks.
>
>  ???????? Ed.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]