|
From: | Dragan Simic |
Subject: | Re: [PATCH 4/4] cut: Optionally treat multiple consecutive delimiters as one |
Date: | Tue, 01 Aug 2023 20:37:43 +0200 |
On 2023-08-01 16:42, Pádraig Brady wrote:
On 01/08/2023 10:07, Dragan Simic wrote:Add new command-line option and the required logic that allow multipleconsecutive delimiters to be treated as a single delimiter. Of course,this option is valid only with the cut's field mode.This new feature should make cut much more usable in various real-worldapplications, some of which are already mentioned in the gotchas. For example, merging the consecutive delimiters is very useful when cut is used to process the outputs of various commands.Add a whole battery of new cut tests, which cover this new feature, andadd more tests for the related already existing features, to make sure no regressions are introduced. While there, clean up the comments and the whitespace in the cut tests a bit, to make them slightly more readable.Thanks for the patch.I wonder whether a --empty-fields={ignore,suppress} is a more general interface.
I wonder would it be a more complex approach, and more importantly, less intuitive? Quite frankly, I think it's easier to visualize the empty space. or the delimiters as a more general approach, becoming "squeezed". I think that visualizing the empty fields is harder, especially when the delimiter is a whitespace character.
This overlaps somewhat with the -w option in FreeBSD's cut,which merges runs of whitespace, and which I was also considering adding.
After thinking a bit about it, how about having both "-m", from the patch I submitted, and "-w", which would behave differently than the FreeBSD's "-w"? Please, allow me to explain.
More specifically, our "-w" would simply "squeeze" all the whitespace in the input without forcing the delimiter to be whitespace. The "squeezing" would produce a whitespace character in the input, instead of whatever got "squeezed" there. That would be either the whitespace character specified as an optional value for the "-w" option, or it may by default produce a space wherever only spaces were "squeezed", or a tab wherever the "squeezed" whitespace contained at least one tab.
With both "-m" and "-w" options in place we'd end up with a quite versatile cut, which would cover what FreeBSD's cut does, and be able to do more. I'd be willing to implement the "-w" option as well.
[Prev in Thread] | Current Thread | [Next in Thread] |