On 2023-08-01 16:42, Pádraig Brady wrote:
On 01/08/2023 10:07, Dragan Simic wrote:
Add new command-line option and the required logic that allow
multiple
consecutive delimiters to be treated as a single delimiter. Of
course,
this option is valid only with the cut's field mode.
This new feature should make cut much more usable in various
real-world
applications, some of which are already mentioned in the gotchas.
For
example, merging the consecutive delimiters is very useful when
cut
is
used to process the outputs of various commands.
Add a whole battery of new cut tests, which cover this new
feature,
and
add more tests for the related already existing features, to make
sure
no regressions are introduced.
While there, clean up the comments and the whitespace in the cut
tests
a bit, to make them slightly more readable.
Thanks for the patch.
I wonder whether a --empty-fields={ignore,suppress} is a more
general
interface.
I wonder would it be a more complex approach, and more importantly,
less intuitive? Quite frankly, I think it's easier to visualize the
empty space. or the delimiters as a more general approach, becoming
"squeezed". I think that visualizing the empty fields is harder,
especially when the delimiter is a whitespace character.
This overlaps somewhat with the -w option in FreeBSD's cut,
which merges runs of whitespace, and which I was also considering
adding.
After thinking a bit about it, how about having both "-m", from the
patch I submitted, and "-w", which would behave differently than the
FreeBSD's "-w"? Please, allow me to explain.
More specifically, our "-w" would simply "squeeze" all the
whitespace
in the input without forcing the delimiter to be whitespace. The
"squeezing" would produce a whitespace character in the input,
instead
of whatever got "squeezed" there. That would be either the
whitespace
character specified as an optional value for the "-w" option, or it
may by default produce a space wherever only spaces were "squeezed",
or a tab wherever the "squeezed" whitespace contained at least one
tab.
With both "-m" and "-w" options in place we'd end up with a quite
versatile cut, which would cover what FreeBSD's cut does, and be
able
to do more. I'd be willing to implement the "-w" option as well.