bug-sed
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#40242: n as delimiter alias


From: Oğuz
Subject: bug#40242: n as delimiter alias
Date: Tue, 31 Mar 2020 10:00:02 +0300

Thanks for the reply. This might not be a bug though; I sent a similar mail
(https://www.mail-archive.com/address@hidden/msg05881.html)
to Austin Group mailing list asking what's the expected behavior in this
case, and I was told (
https://www.mail-archive.com/address@hidden/msg05891.html)
both behaviors -yielding n or empty line- are correct and standard should
*probably* be amended to explicitly state that this is unspecified. And
apparently (
https://www.mail-archive.com/address@hidden/msg05893.html)
some other UNIXes adopted the same practice as GNU sed (or vice versa, I
don't know which one is older).

Regards

31 Mart 2020 Salı tarihinde Assaf Gordon <address@hidden> yazdı:

> tags 40242 confirmed
> stop
>
> Hello,
>
> On 2020-03-25 11:30 p.m., Oğuz wrote:
>
>> While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not
>> match 'n' when 'n' is the delimiter. See:
>>
>> $ echo t | sed 'st\ttt' | xxd
>> 00000000: 0a                                       .
>> $
>> $ echo n | sed 'sn\nnn' | xxd
>> 00000000: 6e0a
>>
>> Is this a bug or is there a sound logic behind this?
>>
>
> Thank you for finding this interesting edge-case.
>
> I think it is a (very old) bug. I'm not sure about its origin,
> perhaps Jim or Paolo can comment.
>
> First,
> let's start with what's expected (slightly modifying your examples):
>
> The canonical usage, here "\t" becomes a TAB, and "t" is not replaced:
>
>    $ printf t | sed 's/\t//' | od -a -An
>       t
>
> Then, using a different character "q" instead of "/", works the same:
>
>    $ printf t | sed 'sq\tqq' | od -a -An
>       t
>
> The sed manual says (in section "3.3 The s command"):
>       "
>       The / characters may be uniformly replaced by any other single
>       character within any given s command.
>
>       The / character (or whatever other character is used in its
>       stead) can appear in the regexp or replacement only if it is
>       preceded by a \ character.
>       "
>
> This is the reason "\t" represents a regular "t" (not TAB)
> *if* the substitute command's delimiter is "t" as well:
>
>       $ printf t | sed 'st\ttt' | od -a -An
>       [no output, as expected]
>
> And similarly for other characters:
>
>       printf x | sed 'sx\xxx' | od -a -An
>       printf a | sed 'sa\aaa' | od -a -An
>       printf z | sed 'sz\zzz' | od -a -An
>       [no output, as expected]
>
> ---
>
> Second,
> The "\n" case behaves differently, regardless of which
> separator is used. It is always treated as "\n" (new line),
> never literal "n", even if the separator is "n":
>
> These are correct, as expected:
>     $ printf n | sed 's/\n//' | od -a -An
>        n
>     $ printf n | sed 's/\n//' | od -a -An
>        n
>     $ printf n | sed 'sx\nxx' | od -a -An
>        n
>
> Here, we'd expect "\n" to be treated as a literal "n" character,
> not "\n", but it is not (as you've found):
>
>     $ printf n | sed 'sn\nnn' | od -a -An
>        n
>
> ----
>
> In the code, the "match_slash" function [1] is used to find
> the delimiters of the "s" command (typically "slashes").
> Special handling happens if a slash is found [2],
> And in lines 557-8 there's this conditional:
>
>               else if (ch == 'n' && regex)
>                 ch = '\n';
>
> Which forces any "\n" to be a new-line, regardless if the
> delimiter itself was an "n".
>
> [1] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n531
> [2] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n552
>
> In older sed versions, these two lines where protected by
> "#ifndef REG_PERL" [3] so perhaps it had something to do with regex
> variants. But the origin of this line predates the git history.
> Jim/Paolo - any ideas what this relates to?
>
> https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c
> ?id=41a169a9a14b5bdc736313eb411f02bcbe1c046d#n551
>
> ---
>
> Interestingly, removing these two lines does not cause
> any test failures, so this might be easy to fix without causing
> any regressions.
>
>
> For now I'm leaving this item open until we decide how to deal with it.
>
> regards,
>  - assaf
>
>
>
>
>

-- 
Oğuz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]