[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Possible bug in ``cut -c --output-delimiter''
From: |
Jim Meyering |
Subject: |
Re: Possible bug in ``cut -c --output-delimiter'' |
Date: |
Wed, 02 Jun 2004 23:24:40 +0200 |
David Krider <address@hidden> wrote:
> On Wed, 2004-05-19 at 21:57, David Krider wrote:
>
>> > cat dump.txt|cut -c 1-69|sort -u|cut --output-delimiter=\| -c
>> 1-17,19-32,34-52,54-
>> 10165301 |M CPC |TABAR, FR CORVETTE |4105128
>> 4656102 RI|HRYSLER |TABAR, FR 01 PT |656102
>> 4694750 |HRYSLER |TABAR, FR NS |VX-2513-H
>> 52088126 |HRYSLER |OIL FR TJ |2088119AB
>> 52088127 |HRYSLER |OIL FR TJ |2088119AB
>> 52088128 |HRYSLER |OIL FR TJ |2088119AB
>> 52088129 |HRYSLER |OIL FR TJ |2088119AB
>> F65A-5B326-CA |ORD |-BAR, FR |MTB-001
>
> I joined the list only to ask about this situation. No one responded, so
> I'll try one more time. Is the changed behavior in cut (with an output
> delimiter) a bug or a feature?
It's a bug. Thanks for the report.
I've just checked in the fix below.
2004-06-02 Jim Meyering <address@hidden>
Fix a bug in how the --output-delimiter=D option works with
abutting byte or character ranges. Reported by David Krider in
http://lists.gnu.org/archive/html/bug-coreutils/2004-05/msg00132.html
* src/cut.c (print_kth): Remove special case for open-ended range.
(set_fields): Record the range start index for an interval even
when it abuts another interval on its low side.
Also record the range start index of the longest right-open-interval.
* tests/cut/Test.pm: Add tests of --output-delimiter=S with
abutting and overlapping byte ranges.
* doc/coreutils.texi (cut invocation): Clarify what
--output-delimiter=STR does with byte/character ranges.
Index: cut.c
===================================================================
RCS file: /fetish/cu/src/cut.c,v
retrieving revision 1.111
diff -u -p -r1.111 cut.c
--- cut.c 17 May 2004 13:16:53 -0000 1.111
+++ cut.c 2 Jun 2004 11:40:27 -0000
@@ -266,14 +266,8 @@ is_range_start_index (size_t i)
static bool
print_kth (size_t k, bool *range_start)
{
- if (0 < eol_range_start && eol_range_start <= k)
- {
- if (range_start)
- *range_start = (k == eol_range_start);
- return true;
- }
-
- if (k <= max_range_endpoint && is_printable_field (k))
+ if ((0 < eol_range_start && eol_range_start <= k)
+ || (k <= max_range_endpoint && is_printable_field (k)))
{
if (range_start)
*range_start = is_range_start_index (k);
@@ -473,25 +467,35 @@ set_fields (const char *fieldstr)
if (output_delimiter_specified)
{
- /* Record the range-start indices. */
- for (i = 0; i < n_rp; i++)
+ /* Record the range-start indices, i.e., record each start
+ index that is not part of any other (lo..hi] range. */
+ for (i = 0; i <= n_rp; i++)
{
size_t j;
- for (j = rp[i].lo; j <= rp[i].hi; j++)
+ size_t rsi = (i < n_rp ? rp[i].lo : eol_range_start);
+
+ for (j = 0; j < n_rp; j++)
{
- if (0 < j && is_printable_field (j)
- && !is_printable_field (j - 1))
+ if (rp[j].lo < rsi && rsi <= rp[j].hi)
{
- /* Record the fact that `j' is a range-start index. */
- void *ent_from_table = hash_insert (range_start_ht,
- (void*) j);
- if (ent_from_table == NULL)
- {
- /* Insertion failed due to lack of memory. */
- xalloc_die ();
- }
- assert ((size_t) ent_from_table == j);
+ rsi = 0;
+ break;
+ }
+ }
+
+ if (eol_range_start && eol_range_start < rsi)
+ rsi = 0;
+
+ if (rsi)
+ {
+ /* Record the fact that `rsi' is a range-start index. */
+ void *ent_from_table = hash_insert (range_start_ht, (void*) rsi);
+ if (ent_from_table == NULL)
+ {
+ /* Insertion failed due to lack of memory. */
+ xalloc_die ();
}
+ assert ((size_t) ent_from_table == rsi);
}
}
}
Index: Test.pm
===================================================================
RCS file: /fetish/cu/tests/cut/Test.pm,v
retrieving revision 1.13
diff -u -p -r1.13 Test.pm
--- Test.pm 23 Jul 2003 07:01:19 -0000 1.13
+++ Test.pm 2 Jun 2004 11:41:04 -0000
@@ -85,6 +85,13 @@ my @tv = (
['out-delim5', '-c2-3,4- --output-d=:', "abcdefg\n", "bc:defg\n", 0],
# This test would fail for cut from coreutils-5.0.1 and earlier.
['out-delim6', '-c2,1-3 --output-d=:', "abc\n", "abc\n", 0],
+#
+['od-abut', '-b1-2,3-4 --output-d=:', "abcd\n", "ab:cd\n", 0],
+['od-overlap', '-b1-2,2 --output-d=:', "abc\n", "ab\n", 0],
+['od-overlap2', '-b1-2,2- --output-d=:', "abc\n", "abc\n", 0],
+['od-overlap3', '-b1-3,2- --output-d=:', "abcd\n", "abcd\n", 0],
+['od-overlap4', '-b1-3,2-3 --output-d=:', "abcd\n", "abc\n", 0],
+['od-overlap5', '-b1-3,1-4 --output-d=:', "abcde\n", "abcd\n", 0],
);
Index: coreutils.texi
===================================================================
RCS file: /fetish/cu/doc/coreutils.texi,v
retrieving revision 1.184
diff -u -p -r1.184 coreutils.texi
--- coreutils.texi 2 Jun 2004 08:35:02 -0000 1.184
+++ coreutils.texi 2 Jun 2004 21:22:47 -0000
@@ -4428,7 +4428,8 @@ With @option{-f}, output fields are sepa
The default with @option{-f} is to use the input delimiter.
When using @option{-b} or @option{-c} to select ranges of byte or
character offsets (as opposed to ranges of fields),
-output @var{output_delim_string} between ranges of selected bytes.
+output @var{output_delim_string} between non-overlapping
+ranges of selected bytes.
@end table