bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#14555: Facing Some problem in uniq command


From: Assaf Gordon
Subject: bug#14555: Facing Some problem in uniq command
Date: Tue, 23 Oct 2018 16:41:08 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

close 14555
stop

(triaging old bugs)

On 05/06/13 09:06 AM, Bob Proulx wrote:
Shahid Hussain wrote:
Appreciate your quick reply. What exactly i m doing is there are so many
files in my product which contains some data in "name =  value" format. By
using some pattern i m extracting only "value" field from all files and
redirecting the output to one temporarily file as i do not want any value
to be repeated in any file. And here i m applying uniq command to this
temporary file (by pipe lining sort [sort |uniq -c tempFile]) But i am
unable to get expected result.

It might be better if in your script you set:

   #!/bin/sh
   LC_ALL=C
   export LC_ALL
   ...
   sort | uniq
   ...

That will force a standard sort order everywhere in your script.

But as you have told whitespace also should be identical at every line so
this might be the problem in my case. Because when i displayed content of
file using cat command and manually copied the same data to another file
and then tried uniq with sort command it works fine.

Without knowing enough about your data a quick and dirty hack to clean
up whitespace might be to pass it through awk.

   awk '{print$1}' somefile1 | sort | uniq ...

Since awk splits on whitespace this will only print the first field
and any whitespace or additional anything will be discarded.

So it is fine for me but it would be too better if there could be an option
in uniq command to work fine even if  whitespace is not identical :).

No.  The way is not to use an option.  The way is to prepare the data
without whitespace differences.  You have the option of using tools
like awk to split on whitespace while preparing the data.  Preparing
the data to avoid whitespace differences is the right option to use.


With no further comments in 5 years, I'm closing this bug.

-assaf








reply via email to

[Prev in Thread] Current Thread [Next in Thread]