[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Feature Request / Discussion: `cut` to several files
From: |
Erik Brinkman |
Subject: |
Re: Feature Request / Discussion: `cut` to several files |
Date: |
Wed, 25 Jan 2017 15:10:58 +0000 |
I had completely overlooked tee. The original use case was to split a csv
by column, in which case a four column csv gets pretty verbose:
source_command | tee >(cut -d, -f1 >file1) | tee >(cut -d, -f2 > file2) |
tee >(cut -d, -f3 > file3) > >(cut -d, -f4 > file4)
However, I haven't needed to do something like this that frequently, and it
seems like the added complexity to cut is probably not worth it. Thanks for
the suggestion.
Erik
On Wed, Jan 25, 2017 at 4:42 AM Pádraig Brady <address@hidden> wrote:
> On 25/01/17 01:13, Erik Brinkman wrote:
> > It'd be nice if cut allowed writing to several files. I'm not sure what
> the
> > appropriate syntax for something like this would be, but I could see a
> > command looking something like:
> >
> > cut -f 1,2 filename1 -f 3-5 filename2
> >
> > or maybe
> >
> > cut -f 1,2:filename1:3-5:filename2
> >
> > I don't think the first syntax is posix, and it's definitely not
> backwards
> > compatible. The second might work, but is pretty ugly. I couldn't find
> > anything related to this in the archive or in the rejected feature
> > requests. Some alternatives with downsides:
> >
> > - Save the buffer and use cut repeatedly on that. The downside is it
> > requires the buffer to be saved.
> > - I managed to throw together an awk script that could be tailored to
> do
> > similar things. This writes column 1 to file 1, etc for all of the
> listed
> > files:
> >
> > awk 'BEGIN { NUM = ARGC; if ( ARGC > 2 ) ARGC = 2 } { for ( I = 2; I <
> > NUM; ++I) { print $(I - 1) > ARGV[I] } }' input_file column_1 column_2
> >
> > The nice part is that this works with subprocesses wiithout saving
> > entire intermediate buffers:
> > paste <(seq 1 10) <(seq 11 20) | awk 'BEGIN { NUM = ARGC; if ( ARGC >
> 2
> > ) ARGC = 2 } { for ( I = 2; I < NUM; ++I) { print $(I - 1) > ARGV[I]
> } }' -
> > >(paste -sd+ | bc) >(paste -sd+ | bc)
> >
> > However, this is ugly, pretty manual, and doesn't support ranges very
> > easily.
> >
> > It seems plausible that cut source could be modified to store a field /
> > character list for each file, open up all of them, and write characters /
> > fields out on the fly as it normally does with stdout. I'm happy to
> > implement this myself and patch, but I'm uncertain if the coreutils team
> > views this as an appropriate addition, and if so, what a proper syntax
> > would look like. It seems like since this is modifying the field spec of
> > cut, it could potentially have ramifications for other field
> specifications
> > in coreutils, although I can't think of any the relate to writing, so it
> > may not matter.
>
> Would tee suffice for this use case?
>
> source_command | tee >(cut -f1,2 >file1) > >(cut -d' ' -f3,5 >file2)
>
> thanks,
> Pádraig
>