bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#38627: uniq -c gets wrong count with non-ascii strings


From: Roy Smith
Subject: bug#38627: uniq -c gets wrong count with non-ascii strings
Date: Sun, 15 Dec 2019 14:40:14 -0500

With the following input:

> $ cat x
> "ⁿᵘˡˡ"
> "ܥܝܪܐܩ"


Running "uniq -c" says there's two copies of the same line!

> $ uniq -c x
>       2 "ⁿᵘˡˡ"


I've attached a copy of the test file, and here's the octal dump:

> $ od -b x
> 0000000 042 342 201 277 341 265 230 313 241 313 241 042 012 042 334 245
> 0000020 334 235 334 252 334 220 334 251 042 012
> 0000032


I'm getting this on:

> Linux tools-sgebastion-08 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) 
> x86_64 GNU/Linux
> uniq (GNU coreutils) 8.26

My MacOS 10.13.6 box gets it right:

> $ uniq -c x
>    1 "ⁿᵘˡˡ"
>    1 "ܥܝܪܐܩ"




reply via email to

[Prev in Thread] Current Thread [Next in Thread]