bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26574: v4.4: POSIX violation with respect to output of a trailing ne


From: Eric Blake
Subject: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix
Date: Thu, 20 Apr 2017 11:46:15 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.0

On 04/20/2017 11:36 AM, Michael Klement wrote:
> Thanks for the detailed feedback, Eric.
> 
> The POSIX spec. is, unfortunately, vague on this topic:
> 
> The definition of a line (which you quote) is complemented with the 
> definition of an incomplete line 
> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_195>:
> 
>> A sequence of one or more non- <newline> characters at the end of the file.
> 
> 
> So while the standard is aware of this possibility and gives it a name that 
> suggests it is a kind of line, but something's missing, there is precious 
> little behavior prescribed with respect to such incomplete lines.
> 

You're welcome to submit a bug report to get POSIX to more clearly word
its intentions that a file with an incomplete line is NOT a text file
(http://austingroupbugs.net/main_page.php), but everyone on the Austin
Group (myself included) has already agreed that the intention is there
(even if the wording could be improved): Omitting a trailing newline
causes sed to enter into the realm of undefined behavior - and this is
BECAUSE there are existing sed implementations that behave differently
when a trailing newline is omitted.  Some do not do anything with an
incomplete line (sed behaves as though the file were truncated at the
last newline).

> So we have:
> 
> sed's "input files shall be text files."
> a text file contains "characters organized into zero or more lines"
> 
> Beyond the "zero or more lines", the only restrictions placed on what 
> constitutes a text file 
> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403>
>  are:
> " The lines do not contain NUL characters and none can exceed {LINE_MAX} 
> bytes in length, including the <newline> character. "
> 
> If you interpret the word "lines" in the phrase "zero or more lines" to mean 
> complete lines only (which is reasonable), then indeed any file that ends in 
> an incomplete line is not a text file.
> 
> I really wish the spec. were more explicit about incomplete lines.

As I said, you're welcome to propose a bug report with suggested wording
improvements.

> 
>>   If anything, the only
>> change I would make is have 'sed --posix' error out on non-text input,
>> to call attention to the user's attempt to feed non-posix-compliant data
>> to sed.
> 
> 
> That is definitely an option, but perhaps intuitive understanding and 
> historical practice / other implementations could be considered instead:
> 
> Intuitively, a file containing text with an incomplete line is obviously 
> still a text file

Not per the POSIX definition of a text file.

It is still a file, but no longer a text file.

It wouldn't be the first time intuition has been wrong.

> wc is an interesting case, which doesn't count an incomplete line as a line 
> (the spec 
> <http://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html>. is 
> actually unambiguous there and mandates counting the newlines),

Indeed, wc is a good example of how the POSIX writers specifically went
out of their way to describe behaviors of programs that MUST be
consistent when presented with a non-text file; as well as the escape
clause that for all other programs (including sed) that require text
file inputs, the behavior is intentionally unspecified if the trailing
newline is not present.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]