bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#19240: cut 8.22 adds newline


From: John Kendall
Subject: bug#19240: cut 8.22 adds newline
Date: Thu, 4 Dec 2014 18:41:48 +0000

Bob Proulx wrote:
> Eric Blake wrote:
>> I'll leave it to other contributors to weigh in on whether omitting
>> the final newline on output when it was missing on input is worth
>> the complexity of a change.
> 
>> Pádraig Brady wrote:
>>> If we were just implementing now, I'd not output the extra '\n',
>>> but changing at this stage needs to be carefully considered,
>>> and with all the textutils, not just cut(1).
>> 
>> I tend to go the opposite - producing text output, even on non-text
>> input, is more likely to be useful when piping files to other utilities
>> that don't handle non-text files as gracefully as the coreutils.  But I
>> definitely agree that it is not something we change lightly.
> 
> I have these thoughts and comments to make.
> 
> 1. I don't "like" input file lines that don't have trailing newlines.
> It raises the question of whether the input is actually valid input.
> It feels to me like any line missing a newline is incomplete.  There
> is likely to have been an error in the creation of it.  Handling it
> silently feels like ignoring the error.  But raising an actual error
> by exit code or by emitting a warning or error message feels too heavy
> handed.  I would lean toward assuming that any incomplete input line
> is actually terminated by a newline as the lessor of the evils.
> 
> 2. The suggesion for for handling *fields* that do not end with a
> trailing newline differently from those that do doesn't make any sense
> to me at all.  What is a field?  Is the newline part of the field?  I
> think not.  Consider this.
> 
>  $ printf "one two" | awk '{print$1}'
>  one
> 
>  $ printf "one two" | awk '{print$2}'
>  two
> 
>  $ printf "one two\n" | awk '{print$1}'
>  one
> 
>  $ printf "one two\n" | awk '{print$2}'
>  two
> 
> The newline is not part of field two.  Otherwise printing it would
> result in the second having two newlines output.
> 
>  $ printf "one two" | cut -d' ' -f1
>  one
> 
>  $ printf "one two" | cut -d' ' -f2
>  two
> 
>  $ printf "one two\n" | cut -d' ' -f1
>  one
> 
>  $ printf "one two\n" | cut -d' ' -f2
>  two
> 
> Same thing for cut.  The newline is not part of any of the fields.
> The newline terminates the input line.  The newline is not associated
> with any of the delimited fields contained in an input line.
> 
> For byte or character operations in the utils such as head -c those
> are binary operations and should be interpreted strictly according to
> the bytes.  But not for cut -c which is column based.
> 
> John Kendall wrote:
>> # Solaris cut
>> $ printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>> 1
>> 12
>> 123
>> 1234
>> 1234
>> 1234$
> 
> That is tickling non-portable behavior.  I had a friend run some tests
> on HP-UX and IBM AIX and the results there were different from
> Solaris.  Seems Solaris is already the unusual case.
> 
> When looking count the "1234" lines carefully.  Because HP-UX and
> older AIX don't process the line without a trailing newline at all.
> It is omitted there.  Newer AIX appears to handle it like GNU.
> 
>  # uname -srm
>  HP-UX B.10.20 9000/785
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  #
> 
>  # uname -srm
>  HP-UX B.11.31 ia64
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  #
> 
>  # uname -s ; oslevel
>  AIX
>  4.3.3.0
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  #
> 
>  # uname -s ; oslevel
>  AIX
>  7.1.0.0
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  1234
>  #
> 
>  # head -1 /etc/motd ; uname -m
>  Compaq Tru64 UNIX V5.0A
>  alpha
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  #
> 
>  # uname -s
>  Darwin
>  # printf "1\n12\n123\n1234\n12345\n123456" | cut -c1-4
>  1
>  12
>  123
>  1234
>  1234
>  1234
>  #
> 
> Using input lines without a trailing newline is already a minefield of
> portability problems.  It depends upon details of the implementation.
> 
> I think what Solaris cut must be doing is processing the emission of
> characters across the line character by character.  When it hits the
> input newline it knows it is done and emits a newline itself and
> starts again on a new line.  When it hits EOF on the input it probably
> just stops doing anything and exits itself without printing anything
> more and therefore not emitting a newline.  Likely just an accident of
> implementation.
> 
> This is what makes "lines" without a newline such an unportable thing
> to count upon.  It causes it to depend upon an implementation detail.
> Different implementation might do different things.  And in fact
> different ones do actually do different things.  This probably isn't
> too widespread of an issue or it would have come up more often.  And
> more specific to the Solaris code port there would be similar problems
> differently if trying to use other legacy Unix platforms.  Best to
> avoid the construct entirely for robust operation.
> 
>> I came upon this while porting scripts from Solaris 10 to Centos 7.
> 
> Can you share with us the specific construct that caused this to
> arise?  I have done a lot of script porting to and from HP-UX systems
> and am curious as to the issue.
> 

The construct in question if just for formatting the output 
of a script that compares disc files to what's in a database.  

 echo "$FILE ===========================\c"| cut -c1-30
 echo " matches =========="


The output on Solaris might look something like this (with 
monospaced font on a terminal all the "matches" line up):

getDFL_info ================== matches ==========
transWestim_msg ============== matches ==========
selfBillDepotStoHan ========== matches ==========
addSale_invoice ============== matches ==========
buildInvoice ================= matches ==========
addInvoice =================== matches ==========
chgUnit ====================== matches ==========
updSale_invoice ============== matches ==========

The gnu output is:

getDFL_info ==================
 matches ==========
transWestim_msg ==============
 matches ==========
selfBillDepotStoHan ==========
 matches ==========
addSale_invoice ==============
 matches ==========
buildInvoice =================
 matches ==========
addInvoice ===================
 matches ==========
chgUnit ======================
 matches ==========
updSale_invoice ==============
 matches ==========

This can be re-written, of course.  (There is one corner case that 
Solaris's cut handled nicely that I have not been able to come up 
with a quick fix.) 

John

> Bob






reply via email to

[Prev in Thread] Current Thread [Next in Thread]