coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uniq: adding support for "-b, --ignore-leading-blanks"


From: Pádraig Brady
Subject: Re: uniq: adding support for "-b, --ignore-leading-blanks"
Date: Fri, 4 Oct 2024 12:42:53 +0100
User-agent: Mozilla Thunderbird Beta

On 04/10/2024 00:49, Todd Lewis wrote:
Hey, folks,

I've spent a most enjoyable few hours implementing
`--ignore-leading-blanks` in `uniq`, including tests and texinfo
updates. It doesn't break any other features, and it solves the problem
that made me start playing with this in the first place.I was surprised
at how little code was necessary to implement `--ignore-leading-blanks`,
and how minimally invasive it was. `vc-dwim` is happy with the
ChangeLog, and I have the output from `git format-patch --stdout -1` all
ready to send in and stored as a draft email.

But I'm not going to submit it, because it's the wrong solution.But by
itself, it just doesn't advance the cause. The workaround indicated in
https://www.pixelbeat.org/docs/coreutils-gotchas.html#uniq
i.e. filtering the input through `tr -s '[:blank:]'` first,
is better, if only because it deals with all blanks equitably, whereas
my `--ignore-leading-blanks` only ignores the leading blanks of the
first unskipped field (normally the beginning of the line). But once you
start dealing with skipped blanks or fields, other blanks start causing
unwelcome surprises.

The trouble with using `tr -s '[:blank:]'` as a pre-filter is that
you've now screwed up the nice columns you probably had, and which you
probably want to preserve, especially if you're using the `--group` option.

Fun though it's been, the right answer, as stated in
https://www.gnu.org/software/coreutils/rejected_requests.html#uniq, is
`uniq --key` and friends
(https://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00211.html).
That however is a much bigger lift.

That discussion from 2006 talks about code sharing between `uniq` and
`sort`, which sounds great, but at first blush uniq seem to take a more
nuanced approach to character encodings than does `sort`. To be fair, I
haven't dug into the `sort` code yet to the degree necessary to justify
such sweeping statements. My feeling is that if `--key` is implemented
right, then the other features on that 2006 wish list should follow
fairly naturally.

If there's still interest in having sort-like `--key` capabilities in
`uniq`, I may begin studying this more seriously. Before I do, though,
I'd like to gauge the likelihood of such work being accepted – assuming
a clean, working implementation of course.

Thank you for the considered discussion on this.
How best to implement such functionality is nuanced,
and often requires consideration of utilities as a group as you've noted.

There has been a previous attempt at this actually:
https://lists.gnu.org/archive/html/coreutils/2013-02/msg00064.html
https://lists.gnu.org/archive/html/coreutils/2013-02/msg00087.html
https://lists.gnu.org/archive/html/coreutils/2015-07/msg00050.html
Unfortunately we hadn't the review cycles to merge this at the time,
but it would be good to revive that implementation if proceeding with this.

cheers,
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]