coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

uniq: adding support for "-b, --ignore-leading-blanks"


From: Todd Lewis
Subject: uniq: adding support for "-b, --ignore-leading-blanks"
Date: Thu, 3 Oct 2024 19:49:03 -0400
User-agent: Mozilla Thunderbird

Hey, folks,

I've spent a most enjoyable few hours implementing `--ignore-leading-blanks` in `uniq`, including tests and texinfo updates. It doesn't break any other features, and it solves the problem that made me start playing with this in the first place.I was surprised at how little code was necessary to implement `--ignore-leading-blanks`, and how minimally invasive it was. `vc-dwim` is happy with the ChangeLog, and I have the output from `git format-patch --stdout -1` all ready to send in and stored as a draft email.

But I'm not going to submit it, because it's the wrong solution.But by itself, it just doesn't advance the cause. The workaround indicated in
https://www.pixelbeat.org/docs/coreutils-gotchas.html#uniq
i.e. filtering the input through `tr -s '[:blank:]'` first,
is better, if only because it deals with all blanks equitably, whereas my `--ignore-leading-blanks` only ignores the leading blanks of the first unskipped field (normally the beginning of the line). But once you start dealing with skipped blanks or fields, other blanks start causing unwelcome surprises.

The trouble with using `tr -s '[:blank:]'` as a pre-filter is that you've now screwed up the nice columns you probably had, and which you probably want to preserve, especially if you're using the `--group` option.

Fun though it's been, the right answer, as stated in https://www.gnu.org/software/coreutils/rejected_requests.html#uniq, is `uniq --key` and friends (https://lists.gnu.org/archive/html/bug-coreutils/2006-06/msg00211.html). That however is a much bigger lift.

That discussion from 2006 talks about code sharing between `uniq` and `sort`, which sounds great, but at first blush uniq seem to take a more nuanced approach to character encodings than does `sort`. To be fair, I haven't dug into the `sort` code yet to the degree necessary to justify such sweeping statements. My feeling is that if `--key` is implemented right, then the other features on that 2006 wish list should follow fairly naturally.

If there's still interest in having sort-like `--key` capabilities in `uniq`, I may begin studying this more seriously. Before I do, though, I'd like to gauge the likelihood of such work being accepted – assuming a clean, working implementation of course.

--
Todd Lewis<utoddl@gmail.com>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]