coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Feature Request / Discussion: `cut` to several files


From: Reuti
Subject: Re: Feature Request / Discussion: `cut` to several files
Date: Wed, 25 Jan 2017 17:35:23 +0100

Hi,

> Am 25.01.2017 um 16:10 schrieb Erik Brinkman <address@hidden>:
> 
> I had completely overlooked tee. The original use case was to split a csv
> by column, in which case a four column csv gets pretty verbose:
> 
> source_command | tee >(cut -d, -f1 >file1) | tee >(cut -d, -f2 > file2) |
> tee >(cut -d, -f3 > file3) > >(cut -d, -f4 > file4)

The number of columns is fixed in the csv I assume, i.e. 4 in your case, and 
`split` can distribute them to individual files:

$ source_command | tr ',' '\n' | split -n r/4

Looks like a `vsplit` as opposite to `paste`.

-- Reuti


> However, I haven't needed to do something like this that frequently, and it
> seems like the added complexity to cut is probably not worth it. Thanks for
> the suggestion.
> 
> Erik
> 
> On Wed, Jan 25, 2017 at 4:42 AM Pádraig Brady <address@hidden> wrote:
> 
>> On 25/01/17 01:13, Erik Brinkman wrote:
>>> It'd be nice if cut allowed writing to several files. I'm not sure what
>> the
>>> appropriate syntax for something like this would be, but I could see a
>>> command looking something like:
>>> 
>>> cut -f 1,2 filename1 -f 3-5 filename2
>>> 
>>> or maybe
>>> 
>>> cut -f 1,2:filename1:3-5:filename2
>>> 
>>> I don't think the first syntax is posix, and it's definitely not
>> backwards
>>> compatible. The second might work, but is pretty ugly. I couldn't find
>>> anything related to this in the archive or in the rejected feature
>>> requests. Some alternatives with downsides:
>>> 
>>>   - Save the buffer and use cut repeatedly on that. The downside is it
>>>   requires the buffer to be saved.
>>>   - I managed to throw together an awk script that could be tailored to
>> do
>>>   similar things. This writes column 1 to file 1, etc for all of the
>> listed
>>>   files:
>>> 
>>>   awk 'BEGIN { NUM = ARGC; if ( ARGC > 2 ) ARGC = 2 } { for ( I = 2; I <
>>>   NUM; ++I) { print $(I - 1) > ARGV[I] } }' input_file column_1 column_2
>>> 
>>>   The nice part is that this works with subprocesses wiithout saving
>>>   entire intermediate buffers:
>>>   paste <(seq 1 10) <(seq 11 20) | awk 'BEGIN { NUM = ARGC; if ( ARGC >
>> 2
>>>   ) ARGC = 2 } { for ( I = 2; I < NUM; ++I) { print $(I - 1) > ARGV[I]
>> } }' -
>>>> (paste -sd+ | bc) >(paste -sd+ | bc)
>>> 
>>>   However, this is ugly, pretty manual, and doesn't support ranges very
>>>   easily.
>>> 
>>> It seems plausible that cut source could be modified to store a field /
>>> character list for each file, open up all of them, and write characters /
>>> fields out on the fly as it normally does with stdout. I'm happy to
>>> implement this myself and patch, but I'm uncertain if the coreutils team
>>> views this as an appropriate addition, and if so, what a proper syntax
>>> would look like. It seems like since this is modifying the field spec of
>>> cut, it could potentially have ramifications for other field
>> specifications
>>> in coreutils, although I can't think of any the relate to writing, so it
>>> may not matter.
>> 
>> Would tee suffice for this use case?
>> 
>>  source_command | tee >(cut -f1,2 >file1) > >(cut -d' ' -f3,5 >file2)
>> 
>> thanks,
>> Pádraig
>> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail


reply via email to

[Prev in Thread] Current Thread [Next in Thread]