bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible bug in ``cut -c --output-delimiter''


From: Jim Meyering
Subject: Re: Possible bug in ``cut -c --output-delimiter''
Date: Wed, 02 Jun 2004 23:24:40 +0200

David Krider <address@hidden> wrote:
> On Wed, 2004-05-19 at 21:57, David Krider wrote:
>
>> > cat dump.txt|cut -c 1-69|sort -u|cut --output-delimiter=\| -c
>> 1-17,19-32,34-52,54-
>>  10165301        |M CPC         |TABAR, FR CORVETTE |4105128
>>  4656102       RI|HRYSLER       |TABAR, FR 01 PT    |656102
>>  4694750         |HRYSLER       |TABAR, FR NS       |VX-2513-H
>>  52088126        |HRYSLER       |OIL FR TJ          |2088119AB
>>  52088127        |HRYSLER       |OIL FR TJ          |2088119AB
>>  52088128        |HRYSLER       |OIL FR TJ          |2088119AB
>>  52088129        |HRYSLER       |OIL FR TJ          |2088119AB
>>  F65A-5B326-CA   |ORD           |-BAR, FR           |MTB-001
>
> I joined the list only to ask about this situation. No one responded, so
> I'll try one more time. Is the changed behavior in cut (with an output
> delimiter) a bug or a feature?

It's a bug.  Thanks for the report.
I've just checked in the fix below.

2004-06-02  Jim Meyering  <address@hidden>

        Fix a bug in how the --output-delimiter=D option works with
        abutting byte or character ranges.  Reported by David Krider in
        http://lists.gnu.org/archive/html/bug-coreutils/2004-05/msg00132.html
        * src/cut.c (print_kth): Remove special case for open-ended range.
        (set_fields): Record the range start index for an interval even
        when it abuts another interval on its low side.
        Also record the range start index of the longest right-open-interval.
        * tests/cut/Test.pm: Add tests of --output-delimiter=S with
        abutting and overlapping byte ranges.
        * doc/coreutils.texi (cut invocation): Clarify what
        --output-delimiter=STR does with byte/character ranges.

Index: cut.c
===================================================================
RCS file: /fetish/cu/src/cut.c,v
retrieving revision 1.111
diff -u -p -r1.111 cut.c
--- cut.c       17 May 2004 13:16:53 -0000      1.111
+++ cut.c       2 Jun 2004 11:40:27 -0000
@@ -266,14 +266,8 @@ is_range_start_index (size_t i)
 static bool
 print_kth (size_t k, bool *range_start)
 {
-  if (0 < eol_range_start && eol_range_start <= k)
-    {
-      if (range_start)
-       *range_start = (k == eol_range_start);
-      return true;
-    }
-
-  if (k <= max_range_endpoint && is_printable_field (k))
+  if ((0 < eol_range_start && eol_range_start <= k)
+      || (k <= max_range_endpoint && is_printable_field (k)))
     {
       if (range_start)
        *range_start = is_range_start_index (k);
@@ -473,25 +467,35 @@ set_fields (const char *fieldstr)
 
   if (output_delimiter_specified)
     {
-      /* Record the range-start indices.  */
-      for (i = 0; i < n_rp; i++)
+      /* Record the range-start indices, i.e., record each start
+        index that is not part of any other (lo..hi] range.  */
+      for (i = 0; i <= n_rp; i++)
        {
          size_t j;
-         for (j = rp[i].lo; j <= rp[i].hi; j++)
+         size_t rsi = (i < n_rp ? rp[i].lo : eol_range_start);
+
+         for (j = 0; j < n_rp; j++)
            {
-             if (0 < j && is_printable_field (j)
-                 && !is_printable_field (j - 1))
+             if (rp[j].lo < rsi && rsi <= rp[j].hi)
                {
-                 /* Record the fact that `j' is a range-start index.  */
-                 void *ent_from_table = hash_insert (range_start_ht,
-                                                     (void*) j);
-                 if (ent_from_table == NULL)
-                   {
-                     /* Insertion failed due to lack of memory.  */
-                     xalloc_die ();
-                   }
-                 assert ((size_t) ent_from_table == j);
+                 rsi = 0;
+                 break;
+               }
+           }
+
+         if (eol_range_start && eol_range_start < rsi)
+           rsi = 0;
+
+         if (rsi)
+           {
+             /* Record the fact that `rsi' is a range-start index.  */
+             void *ent_from_table = hash_insert (range_start_ht, (void*) rsi);
+             if (ent_from_table == NULL)
+               {
+                 /* Insertion failed due to lack of memory.  */
+                 xalloc_die ();
                }
+             assert ((size_t) ent_from_table == rsi);
            }
        }
     }
Index: Test.pm
===================================================================
RCS file: /fetish/cu/tests/cut/Test.pm,v
retrieving revision 1.13
diff -u -p -r1.13 Test.pm
--- Test.pm     23 Jul 2003 07:01:19 -0000      1.13
+++ Test.pm     2 Jun 2004 11:41:04 -0000
@@ -85,6 +85,13 @@ my @tv = (
 ['out-delim5', '-c2-3,4- --output-d=:', "abcdefg\n", "bc:defg\n",      0],
 # This test would fail for cut from coreutils-5.0.1 and earlier.
 ['out-delim6', '-c2,1-3 --output-d=:', "abc\n", "abc\n",       0],
+#
+['od-abut',    '-b1-2,3-4 --output-d=:', "abcd\n", "ab:cd\n",  0],
+['od-overlap', '-b1-2,2   --output-d=:', "abc\n",  "ab\n",     0],
+['od-overlap2', '-b1-2,2- --output-d=:', "abc\n",  "abc\n",    0],
+['od-overlap3', '-b1-3,2- --output-d=:', "abcd\n",  "abcd\n",  0],
+['od-overlap4', '-b1-3,2-3 --output-d=:', "abcd\n",  "abc\n",  0],
+['od-overlap5', '-b1-3,1-4 --output-d=:', "abcde\n",  "abcd\n",        0],
 
 );
 
Index: coreutils.texi
===================================================================
RCS file: /fetish/cu/doc/coreutils.texi,v
retrieving revision 1.184
diff -u -p -r1.184 coreutils.texi
--- coreutils.texi      2 Jun 2004 08:35:02 -0000       1.184
+++ coreutils.texi      2 Jun 2004 21:22:47 -0000
@@ -4428,7 +4428,8 @@ With @option{-f}, output fields are sepa
 The default with @option{-f} is to use the input delimiter.
 When using @option{-b} or @option{-c} to select ranges of byte or
 character offsets (as opposed to ranges of fields),
-output @var{output_delim_string} between ranges of selected bytes.
+output @var{output_delim_string} between non-overlapping
+ranges of selected bytes.
 
 
 @end table




reply via email to

[Prev in Thread] Current Thread [Next in Thread]