coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: nl: -d option with a single character


From: Pádraig Brady
Subject: Re: nl: -d option with a single character
Date: Mon, 14 Dec 2020 16:53:52 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Thunderbird/84.0

On 13/12/2020 23:23, KOBAYASHI Takashi wrote:
Hello,

I found a bug of -d in nl when a single character is specified. The patches
are also attached. I think there are two ways to fix this.

The current behavior:
$ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@'
      1 a
      2 @:@:
      3 b
      4 @@
      5 c

POSIX expectations(Attach: nl-d-posix.patch):
$ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@'
      1 a

      1 b
      2 @@
      3 c

POSIX(1003.1-2017)
-d  delim
Specify the delimiter characters that indicate the start of a logical
page section. These can be changed from the default characters "\:" to two
user-specified characters. If only one character is entered, the second
character shall remain the default character ':'.

Also, Texinfo is written in the same behavior as POSIX.
‘-d CD’
‘--section-delimiter=CD’
     Set the section delimiter characters to CD; default is ‘\:’.  If
     only C is given, the second remains ‘:’.  (Remember to protect ‘\’
     or other metacharacters from shell expansion with quotes or extra
     backslashes.)


I don't like this complex POSIX specification because it doesn't allow us
to specify a single character delimiter.
Does anyone know why POSIX was specified the way it was? I would like to
know the background.
I believe that the section delimiter should be generated from the string
given to the option, regardless of the length of the characters. And by the
string given two characters, it is possible to make the second character
":".

The following is more clear(Attach: nl-d-incompatible.patch):
$ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@'
      1 a
      2 @:@:
      3 b

      1 c
$ echo -e 'a\n@:@:\nb\n@@\nc' | nl -d '@:'
      1 a

      1 b
      2 @@
      3 c

Nice find!

I agree that supporting matching a single char would be preferable
to the POSIX mandated behavior of assuming a second ':' character.
However no-one has specifically asked for the single char support,
and I don't think it's worth breaking compat with POSIX and other
implementations. So we should not apply your second "incompatible"
patch I think.

BTW, we also have an undocumented behavior that strings longer
than two characters can be provided.
We should probably document that GNU extension.

As for the implementation, this code is already a bit "copy and pastey",
so I'd prefer not to copy the logic again.

I've attached two patches to address this.
The first changes nothing, but refactors things to simplify the fix.
The second (in your name) applies the specific fix and adds a test.

I'll apply these later if you're Ok with it.

cheers,
Pádraig

Attachment: nl-delim-refactor.patch
Description: Text Data

Attachment: nl-delim-fix.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]