coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] csplit: new option --suppress-matched


From: Pádraig Brady
Subject: Re: [PATCH] csplit: new option --suppress-matched
Date: Sat, 30 Mar 2013 05:08:53 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 03/28/2013 10:10 PM, Assaf Gordon wrote:
> Hello,
> 
> 
> Assaf Gordon wrote, On 03/07/2013 05:39 PM:
>>
>> Attached is a new option for csplit, suppress-matched, as been mentioned few 
>> times before (e.g. 
>> http://lists.gnu.org/archive/html/coreutils/2013-02/msg00170.html ).
>>
> 
> Attached updated version (works with both regexp and int patterns).
> Also updated tests.
> 
> Comments are welcomed,

Thanks for continuing with this.
The awkward case here is with integer boundaries and offsets.

First considering overlapping boundaries.

# Without -m we have:
$ seq 4 | csplit - -qk 2 3; grep '' xx??; rm xx??
xx00:1
xx01:2
xx02:3
xx02:4

# But with -m boundary lines 2 and 3 are suppressed.
$ seq 4 | csplit - -m -qk 2 3; grep '' xx??; rm xx??
xx00:1
xx02:4

Now for the second pattern "3", one might consider
that 2 is not the matching pattern there and so should be output.
But this isn't the caes as all patterns are not treated independently.

The same sort of consideration applies for offsets.

$ grep -H '' in
in:1
in:
in:2
in:3
in:4
in:
in:5

# So -m without offsets works well:
$ csplit in -m -qk '/^$/' '{*}'; grep '' xx??; rm xx??
xx00:1
xx01:2
xx01:3
xx01:4
xx02:5

# Adding in the offset, we currently consider the
# offset line as the one to suppress, rather than the matched pattern.
$ csplit in -m -qk '/^$/'1 '{*}'; grep '' xx??; rm xx??
xx00:1
xx00:
xx01:3
xx01:4
xx01:

This could be confusing, but at least it's consistent.
So more accurately what we're doing is suppressing the boundary line.

So less confusingly and more accurately,
this option should probably be named/described as:

--suppress-boundary
  Suppress the boundary line from the start of the second and subsequent splits.

Nice work on the tests BTW.
I hope to apply this with the adjusted naming over the weekend.

thanks,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]