Hey, folks,
I've spent a most enjoyable few hours implementing
`--ignore-leading-blanks` in `uniq`, including tests and texinfo
updates. It doesn't break any other features, and it solves the problem
that made me start playing with this in the first place.I was surprised
at how little code was necessary to implement `--ignore-leading-blanks`,
and how minimally invasive it was. `vc-dwim` is happy with the
ChangeLog, and I have the output from `git format-patch --stdout -1` all
ready to send in and stored as a draft email.
But I'm not going to submit it, because it's the wrong solution.But by
itself, it just doesn't advance the cause. The workaround indicated in
https://www.pixelbeat.org/docs/coreutils-gotchas.html#uniq
i.e. filtering the input through `tr -s '[:blank:]'` first,
is better, if only because it deals with all blanks equitably, whereas
my `--ignore-leading-blanks` only ignores the leading blanks of the
first unskipped field (normally the beginning of the line). But once you
start dealing with skipped blanks or fields, other blanks start causing
unwelcome surprises.
The trouble with using `tr -s '[:blank:]'` as a pre-filter is that
you've now screwed up the nice columns you probably had, and which you
probably want to preserve, especially if you're using the `--group` option.
Fun though it's been, the right answer, as stated in
https://www.gnu.org/software/coreutils/rejected_requests.html#uniq, is
`uniq --key` and friends
(https://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00211.html).
That however is a much bigger lift.
That discussion from 2006 talks about code sharing between `uniq` and
`sort`, which sounds great, but at first blush uniq seem to take a more
nuanced approach to character encodings than does `sort`. To be fair, I
haven't dug into the `sort` code yet to the degree necessary to justify
such sweeping statements. My feeling is that if `--key` is implemented
right, then the other features on that 2006 wish list should follow
fairly naturally.
If there's still interest in having sort-like `--key` capabilities in
`uniq`, I may begin studying this more seriously. Before I do, though,
I'd like to gauge the likelihood of such work being accepted – assuming
a clean, working implementation of course.