bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#14224: Feature request for the `cut`: record delimiter


From: Pádraig Brady
Subject: bug#14224: Feature request for the `cut`: record delimiter
Date: Thu, 18 Apr 2013 09:18:30 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 04/18/2013 08:41 AM, George Brink wrote:
> On Wed, Apr 17, 2013 at 9:13 PM, Pádraig Brady <address@hidden> wrote:
> 
>> On 04/17/2013 02:26 PM, George Brink wrote:
>>> Hello,
>>>
>>> I have a task of extracting several "fields" from the text file. The
>>> standard `cut` tool could be a perfect tool for a job, but...
>>> In my file the '\n' character is a legal symbol inside fields and
>> therefore
>>> the text file uses other symbol for record-separator. And the `cut` has a
>>> hard-coded '\n' for record separator (I just checked the source from the
>>> coreutils-8.21 package).
>>
>> The patch would be simple but not without compatibility cost.
>> I.E. scripts using this would immediately become incompatible
>> with any systems without this feature.
>>
>> So you'd like something like tac -s, --separator
>> However cut -s is taken, so we'd have to avoid the short -s at least.
>> Also tac -s takes a string rather than a character, so
>> that gives some extra credence (and complexity) to that option there.
>>
>> Also related would be to support the -z, --zero-terminated option.
>> join, sort and uniq all have this option to use NUL as the record
>> separator,
>> however they're all closely related sort dependent utilities
>> and we're trying to unify options between them.
>>
>> If it is just a character you want to separate on,
>> then you can always use tr to convert before processing,
>> albeit with associated data copying overhead.
>>
>> SEP=^
>> tr "$SEP"'\n' '\n'"$SEP" | cut ... | tr "$SEP"'\n' '\n'"$SEP"
>>
>> So given that cut is not special here among the text filters,
>> and there is a workaround available, I'm 60:40 against
>> adding this feature.
>>
>> thanks,
>> Pádraig.
>>
> 
> Pádraig,
>
> Thank you for alternative suggestions.
> Actually I just found yet another way to solve my problem:
> perl -0002 -F"\001" -an -e "print((join \"\001\", @F[0..2,14..46]),
> \"\002\");" data.dat >new_data.dat
> It works fine, but I am a little concerned of the speed. I have over three
> hundreds of such files, from 3Mb to 30Mb each. And this process should be
> run every day... I thought that by using cut (which just looks for
> delimiters) I can gain a few minutes on the whole process.
>
> Originally I though of adding "-r, --record-delimiter=DELIM" and
> "--output-record-delimiter=DELIM: keys to the cut.
> Then the example above could be done with
> cut -d☺ -r☻ --output-delimiter=☺ --output-record-delimiter=☻ -f1-3,15-47
> data.dat >new_data.dat
> I think it is feasible and would be more convenient (and hopefully faster)
> than using a whole perl or two calls to tr.

Yes they're the tradeoffs.
awk is often suggested too as an alternative to cut.

> Bob,
> I understand your desire to receive a discussion of features not inside the
> bug related mail list, but here is a extract from the README:
>> Mail suggestions and bug reports for these programs to
>> the address on the last line of --help output.
> And guess what, the `cut --help` has the bug-coreutils email in the last
> line! The coreutils email is not mentioned inside README at all. And
> bug-coreutils is mentioned several times in different context.
> I apologize for using this mail-list inappropriately, but I did not know
> about any other mail-lists

No worries.  I saw no issue with your mails.
In future cut --help will just point at the
following URL which hopefully is easier to follow:
http://www.gnu.org/software/coreutils/

thanks,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]