bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26574: v4.4: POSIX violation with respect to output of a trailing ne


From: Michael Klement
Subject: bug#26574: v4.4: POSIX violation with respect to output of a trailing newline, even with --posix
Date: Thu, 20 Apr 2017 15:32:13 -0400

Thanks for digging into this, it indeed illustrates the point well.

Just for the record:

Here's what I get on FreeBSD 10.1.2 and on macOS 10.12.4:

$ printf 'a' | sed '' | od -tx1
0000000    61  0a                                                        
0000002

macOS typically comes with an older version of the BSD implementation (which 
doesn't support --version, but the man pages are dated June 20, 2014 and May 
10, 2005, respectively).

Another (minor) point of interest:

On macOS 10.12.4 (but not FreeBSD 10.1.2), Sed chokes on bytes that aren't 
valid in UTF-8 encoding, when using regex-based functionality:

$ printf '\xfc\n' | sed  -n '/./p'
sed: RE error: illegal byte sequence




> On Apr 20, 2017, at 2:32 PM, Assaf Gordon <address@hidden> wrote:
> 
> Hello,
> 
> On Thu, Apr 20, 2017 at 11:46:15AM -0500, Eric Blake wrote:
>> On 04/20/2017 11:36 AM, Michael Klement wrote:
>>> Thanks for the detailed feedback, Eric.
>>> 
>>> The POSIX spec. is, unfortunately, vague on this topic:
>>> 
>>> The definition of a line (which you quote) is complemented with the 
>>> definition of an incomplete line 
>>> <http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_195>:
>>> 
>>>> A sequence of one or more non- <newline> characters at the end of the file.
>>> 
>>> 
>>> So while the standard is aware of this possibility and gives it a name that 
>>> suggests it is a kind of line, but something's missing, there is precious 
>>> little behavior prescribed with respect to such incomplete lines.
>>> 
>> 
>> You're welcome to submit a bug report to get POSIX to more clearly word
>> its intentions that a file with an incomplete line is NOT a text file
>> (http://austingroupbugs.net/main_page.php), but everyone on the Austin
>> Group (myself included) has already agreed that the intention is there
>> (even if the wording could be improved): Omitting a trailing newline
>> causes sed to enter into the realm of undefined behavior - and this is
>> BECAUSE there are existing sed implementations that behave differently
>> when a trailing newline is omitted.  Some do not do anything with an
>> incomplete line (sed behaves as though the file were truncated at the
>> last newline).
>> 
> 
> For completeness, here's the behaviour of several implementaions:
> 
> sed implementations that do not add a newline (like gnu sed):
>  FreeBSD 10
>  OpenBSD 5.9
>  BusyBox 1.22
>  ToyBox 7.2
>  AIX 7
> 
> sed implementations that do add a new line:
>  NetBSD 7.0
>  Heirloom
> 
> SunOS 5.11's sed prints nothing if there is no newline:
>  $ printf 'a' | sed '' | od -tx1
>  0000000
>  $ printf 'a\n' | sed '' | od -tx1
>  0000000 61 0a
>  0000002
>  $ uname -a
>  SunOS unstable11s 5.11 11.2 sun4u sparc SUNW,SPARC-Enterprise
>  $ which sed
>  /usr/bin/sed
> 
> 
> The behaviour (of processing a file without newline at the last line) also 
> differs in other programs/languages/implementations:
> 
>  $ printf a | perl -npe '' | od -tx1
>  0000000 61
>  0000001
> 
>  $ printf a | perl -lnpe '' | od -tx1
>  0000000 61 0a
>  0000002
> 
>  $ printf a | awk '{print}' | od -tx1
>  0000000 61 0a
>  0000002
> 
>  $ printf 'a' | sh -c 'while read A ; do echo $A ; done' | od -tx1
>  0000000
> 
>  $ printf 'a' \
>     | python3 -c 'import sys; [print(x,end="") for x in sys.stdin]' \
>     | od -tx1
>  0000000 61
>  0000001
> 
>  $ printf a | uniq-gnu | od -t x1
>  0000000 61 0a
>  0000002
> 
>  $ printf a | uniq-freebsd-11 | od -t x1
>  0000000    61
>  0000001
> 
>  $ printf a | cut-gnu -f1 | od -tx1
>  0000000 61 0a
>  0000002
> 
>  $ printf a | cut-freebsd-11 -f1 | od -tx1
>  0000000    61
>  0000001
> 
>  $ printf a | sort | od -t x1
>  0000000 61 0a
>  0000002
> 
> 
> And this reinforces what Eric wrote: there is simply no
> 'one correct' (or agreed-upon) way to deal with files without newlines on the 
> last line.
> 
> 
> regards,
> - assaf



reply via email to

[Prev in Thread] Current Thread [Next in Thread]