coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cut with multibyte support for delimiter


From: Sebastian Kisela
Subject: Re: cut with multibyte support for delimiter
Date: Fri, 22 Sep 2017 15:05:15 +0200

Hi Assaf!

One thing I noticed is that the tests fail on my computer.
> I see things like:
> ====
>   cut-multibyte.pl: test mbd-newline-24: stdout mismatch, comparing
>         mbd-newline-24.2 (expected) and mbd-newline-24.O (actual)
>   *** mbd-newline-24.2    Mon Sep 18 20:02:46 2017
>   --- mbd-newline-24.O    Mon Sep 18 20:02:46 2017
>   ***************
>   *** 1 ****
>   ! aꝤb
>   --- 1 ----
>   ! a$Ꝥb
> ====
> It is the extra dollar sign before the multibyte character which hints
> to me it is related to the interaction between Perl
> (which converts \xNN sequences) and the shell command line
> (where you've used the $'\xNN' syntax).
>
> The test was:
>    ['mbd-newline-24', "-d'\n'", '-f1,2', "--ou=\$'\xEA\x9D\xA4'",
>         {IN=>"a\nb\n"}, {OUT=>"a\xEA\x9D\xA4b\n"}],
>
>
Thanks for testing! I was able to reproduce it and it should be just fine
with -d'\x{NN}'
mentioned bellow, which I used.


> Also,
> I'm not sure if coreutils currently allows the newer $'\xNN' construct
> in tests - this might be too new to be supported everywhere (comments,
> anyone? I'll also try to look for them in other tests).
>
> In any case, Perl itself can easily generate UTF-8 characters and send
> them as-is to the program being tested, I think that will suffice.
>

>
> Planning ahead, since this is going to be a large addition,
> we'll need to ask you for copyright assignment for your code contributions.
>
> You can read more about it here:
>   https://www.gnu.org/licenses/why-assign.en.html
>   https://www.fsf.org/licensing/assigning.html
>
> To begin the process, please fill the information here:
> https://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/
> Copyright/request-assign.future
>
> and send it to address@hidden .
>

Thanks, I will.
Attached patch with fixed tests. (I should probably add even more tests
anyway)

Sebastián.

Attachment: cut-multibyte-delimiter.tar.gz
Description: GNU Zip compressed data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]