bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9420: cut: --output-delimiter ignored in combination with -c


From: Jim Meyering
Subject: bug#9420: cut: --output-delimiter ignored in combination with -c
Date: Thu, 01 Sep 2011 23:19:26 +0200

Jim Meyering wrote:
> Pádraig Brady wrote:
>> On 09/01/2011 07:33 PM, Philipp Thomas wrote:
>>>
>>> Cut from older coreutils (at least until 7.1) honoured --output-delimiter in
>>> combination with -c.  Newer coreutils don't, i.e. with the older cut you get
>>>
>>> $ echo 12 | cut --output-delimiter=X -c1,2
>>> 1X2
>>>
>>> And with the newer ones
>>>
>>> $ echo 12 | cut --output-delimiter=X -c1,2
>>> 12
>>>
>>> Is this a regression or was this a deliberate change that wasn't documented?
>>
>> Looks like a regression introduced with the i18n patch,
>> so I'm closing this here.
>>
>> $ echo 12 | cut --output-delimiter=X -c1,2
>> 12
>> $ echo 12 | LANG=C cut --output-delimiter=X -c1,2
>> 1X2
>
> Wondering how that could happen, given our test suite,
> I realized that we take care to set LC_ALL=C for most tests.
> At least for cut, I'm changing that.  Now we'll run each test with
> LC_ALL=C, and again (when possible) with e.g., LC_ALL=fr_FR.UTF-8.
>
>>From ea8295673ebe81b8b0a64bc35a497a44ea419934 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <address@hidden>
> Date: Thu, 1 Sep 2011 21:30:10 +0200
> Subject: [PATCH] tests: exercise distro-added multibyte code paths in cut
>
> * tests/misc/cut: Repeat each test using a multibyte locale,
> if the configure-time test found one.

Ahem.
Running that against the cut from F15's coreutils-8.10-2.fc15.x86_64,
I get numerous segfaults, one for each of these, as well as for
each of the from-stdin variants:

  :|/usr/bin/cut --output-d=: -b4567890-
  :|/usr/bin/cut --output-d=: -c4567890-
  :|/usr/bin/cut --output-d=: -f4567890-

Each does this:

  zsh: segmentation fault (core dumped)  /usr/bin/cut --output-d=: -c4567890-

The new tests exposed another minor bug in the MB patch series.
The patched cut gives this diagnostic:

    cut: invalid byte, character or field list

while the upstream version gives a more precise one:

    cut: invalid decreasing range

That too causes test failures.
The "inval1" test provides one example:

    cut: test inval1: stderr mismatch, comparing inval1.E (actual) and inval1.3 
(expected)
    *** inval1.E    Thu Sep  1 23:01:36 2011
    --- inval1.3    Thu Sep  1 23:01:36 2011
    ***************
    *** 1,2 ****
    ! cut: invalid byte, character or field list
      Try `cut --help' for more information.
    --- 1,2 ----
    ! cut: invalid decreasing range
      Try `cut --help' for more information.
    inval1.r...

Notice that the offending diagnostic there mentions "character".
That's an addition from the multi-byte patch series.
To make the test suite accommodate that new diagnostic,
I had to make an additional change:

diff --git a/tests/misc/cut b/tests/misc/cut
index 7c1450b..7ed4134 100755
--- a/tests/misc/cut
+++ b/tests/misc/cut
@@ -170,6 +170,19 @@ if ($mb_locale ne 'C')
       {
         my @new_t = @$t;
         my $test_name = shift @new_t;
+
+        # Depending on whether cut is multi-byte-patched,
+        # it emits different diagnostics:
+        #   non-MB: invalid byte or field list
+        #   MB:     invalid byte, character or field list
+        # Adjust the expected error output accordingly.
+        if (grep {ref $_ eq 'HASH' && exists $_->{ERR} && $_->{ERR} eq $inval}
+            (@new_t))
+          {
+            my $sub = {ERR_SUBST => 's/, character//'};
+            push @new_t, $sub;
+            push @$t, $sub;
+          }
         push @new, ["$test_name-mb", @new_t, {ENV => "LC_ALL=$mb_locale"}];
       }
     push @Tests, @new;

Here's the amended patch:

>From 553cd6b5b39ecbaa4fe807099e754373eff9ea1e Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Thu, 1 Sep 2011 21:30:10 +0200
Subject: [PATCH] tests: exercise distro-added multibyte code paths in cut

* tests/misc/cut: Repeat each test using a multibyte locale,
if the configure-time test found such a locale.
Adjust the tests so that they also accept a slightly
different diagnostic that is specific to the MB-patched cut.
---
 tests/misc/cut |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/tests/misc/cut b/tests/misc/cut
index c905ba9..7ed4134 100755
--- a/tests/misc/cut
+++ b/tests/misc/cut
@@ -23,6 +23,10 @@ use strict;
 # Turn off localization of executable's output.
 @ENV{qw(LANGUAGE LANG LC_ALL)} = ('C') x 3;

+my $mb_locale = $ENV{LOCALE_FR_UTF8};
+! defined $mb_locale || $mb_locale eq 'none'
+  and $mb_locale = 'C';
+
 my $prog = 'cut';
 my $try = "Try \`$prog --help' for more information.\n";
 my $from_1 = "$prog: fields and positions are numbered from 1\n$try";
@@ -156,6 +160,35 @@ my @Tests =
   ['big-unbounded-f', '--output-d=:', '-f1234567890-', {IN=>''}, {OUT=>''}],
  );

+if ($mb_locale ne 'C')
+  {
+    # Duplicate each test vector, appending "-mb" to the test name and
+    # inserting {ENV => "LC_ALL=$mb_locale"} in the copy, so that we
+    # provide coverage for the distro-added multi-byte code paths.
+    my @new;
+    foreach my $t (@Tests)
+      {
+        my @new_t = @$t;
+        my $test_name = shift @new_t;
+
+        # Depending on whether cut is multi-byte-patched,
+        # it emits different diagnostics:
+        #   non-MB: invalid byte or field list
+        #   MB:     invalid byte, character or field list
+        # Adjust the expected error output accordingly.
+        if (grep {ref $_ eq 'HASH' && exists $_->{ERR} && $_->{ERR} eq $inval}
+            (@new_t))
+          {
+            my $sub = {ERR_SUBST => 's/, character//'};
+            push @new_t, $sub;
+            push @$t, $sub;
+          }
+        push @new, ["$test_name-mb", @new_t, {ENV => "LC_ALL=$mb_locale"}];
+      }
+    push @Tests, @new;
+  }
+
+
 @Tests = triple_test address@hidden;

 my $save_temps = $ENV{DEBUG};
--
1.7.7.rc0.362.g5a14





reply via email to

[Prev in Thread] Current Thread [Next in Thread]