sed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gnu sed's 'l' command behavior with -z (and without)


From: Jim Meyering
Subject: Re: gnu sed's 'l' command behavior with -z (and without)
Date: Sun, 7 Aug 2016 08:40:47 -0700

On Sat, Aug 6, 2016 at 12:02 PM, Assaf Gordon <address@hidden> wrote:
> Hello,
>
> (starting a new thread from previous discussion: 
> http://lists.gnu.org/archive/html/sed-devel/2016-08/msg00000.html )
>
> regarding this:
>
>> On Aug 1, 2016, at 13:41, Jim Meyering <address@hidden> wrote:
>>
>> On Sat, Jul 30, 2016 at 11:46 PM, Assaf Gordon <address@hidden> wrote:
>>>  sed: adjust line-terminator of F/l/= commands when -z is used
>>
>> In the second patch, this change
>>
>>       if (width+olen >= line_len && line_len > 0) {
>> -          ck_fwrite("\\\n", 1, 2, fp);
>> +          ck_fwrite("\\", 1, 1, fp);
>> +          ck_fwrite(&buffer_delimiter, 1, 1, fp);
>>
>> appears to change from emitting backslash-NL-continued lines to
>> backslash-NUL with -z. When using -z, do you still want to emit that
>> backslash?
>> Note that this is in code to honor sed's --line-length=N (-l) option,
>> which one can argue is not relevant with -z.
>
> I think we should output 'backslash-NUL' in such cases, unless we decide to 
> make 'l' command output with '-z' mode ignore line-length limitation and 
> never fold.

My first reaction was that with -z (implying machine-readable and no
line-length limitation), there should be no line splitting. Hence my
"Note ...". But perhaps that is too invasive making the the 'l'
command ignore its numeric operand when used with -z.

After reading all of this (thanks!), I agree that backslash-NUL does
make more sense, if you choose to split lines even with -z.

You're welcome to make the call. I'll be happy with your patch or with
one that does no folding with -z, albeit leaning 60:40 in favor of
your patch.

> Without backslash-NUL for folded lines, the output will be inconsistent 
> compared to regalur newline output.
> For example, the following will not be equivalent:
>
>     printf '%s\0' aaaaaaaa bbbbbbbb | ./sed/sed -nz 'N;l5' | tr '\000' '\n' | 
> sed 's/\\000/\n/g'
>     printf '%s\n' aaaaaaaa bbbbbbbb | ./sed/sed -n 'N;l5'
>
> and vise-versa:
>
>     printf '%s\n' aaaaaaaa bbbbbbbb | ./sed/sed -n 'N;l5' | tr '\n' '\000' | 
> sed 's/\\n/\\000/g'
>     printf '%s\0' aaaaaaaa bbbbbbbb | ./sed/sed -nz 'N;l5'

> As a side note,
>
> It seems gnu sed's 'l' command output differs from FreeBSD/MacOS's sed in 
> regards to embedded newlines.
> Reading the POSIX standard, it's not clear to me which is correct (or perhaps 
> both are correct). POSIX does not say that embedded newline should be 
> converted to '\n'.
>
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html:
>
> "(The letter ell.) Write the pattern space to standard output in a visually 
> unambiguous form. The characters listed in XBD Escape Sequences and 
> Associated Actions ( '\\', '\a', '\b', '\f', '\r', '\t', '\v' ) shall be 
> written as the corresponding escape sequence; the '\n' in that table is not 
> applicable. Non-printable characters not in that table shall be written as 
> one three-digit octal number (with a preceding <backslash>) for each byte in 
> the character (most significant byte first)."
>
>
> In practical terms, it means gnu sed prints '$<NEWLINE>' at the end of the 
> printed pattern,
> while freebsd sed prints '$<NEWLINE>' at the end of every printed line.
>
> The following will demonstrate:
>
>      $ printf "aXa\n" aXa | freebsd-sed -n 'y/X/\n/;l'
>      a$
>      a$
>
>      $ printf "%s\n" aXa | gnu-sed -n 'y/X/\n/;l'
>      a\na$
>
>      $ printf "%s\n" aaa bbb | freebsd-sed -n 'N;l'
>      aaa$
>      bbb$
>
>      $ printf "%s\n" aaa bbb | gnu-sed -n 'N;l'
>      aaa\nbbb$

I prefer GNU sed's approach.

> Adding line-folding complicates matters:
>
>     $ printf "%s\n" aXaaa | COLUMNS=3 freebsd-sed -n 'y/X/\n/;l'
>     a$
>     aa\
>     a$
>
>     $ printf "%s\n" aXaaa | gnu-sed -l3 -n 'y/X/\n/;l'
>     a\
>     \n\
>     aa\
>     a$
>
> (gnu-sed ignores COLUMNS envvar, but provides '-l N' extension or 'lN' 
> command-extension).

I approve of ignoring envvars :-)

> In freebsd-sed, there are only two options:
> either 'backslash-<newline>' is printed, indicating line-folding,
> or 'dollar-<newline>' is printed, indicated end-of-line.
>
> gnu-sed adds a third option: 'backslash-<n>' indicates an embedded newline in 
> the pattern.
>
> That's another reason I'd like to keep printing 'backslash-NUL' with -z:
> It makes the output consistent:
> Either 'backslash-DELIMITER' or 'dollar-DELIMTER' or 
> 'backslash-ESCAPE-DELIMITER' (meaning '\n' or '\000') - regardless of what 
> delimiter it is.
>
> regards,
>  - assaf
>
> P.S.
> This is obviously bike-shedding, as the '-z' option has been added in 
> feb-2012 (commit a08590648) and it doesn't seem anyone ever complained about 
> -z with 'l'.

Thanks for taking the time to write all of this.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]