[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#38627: uniq -c gets wrong count with non-ascii strings
From: |
Roy Smith |
Subject: |
bug#38627: uniq -c gets wrong count with non-ascii strings |
Date: |
Sun, 15 Dec 2019 14:40:14 -0500 |
With the following input:
> $ cat x
> "ⁿᵘˡˡ"
> "ܥܝܪܐܩ"
Running "uniq -c" says there's two copies of the same line!
> $ uniq -c x
> 2 "ⁿᵘˡˡ"
I've attached a copy of the test file, and here's the octal dump:
> $ od -b x
> 0000000 042 342 201 277 341 265 230 313 241 313 241 042 012 042 334 245
> 0000020 334 235 334 252 334 220 334 251 042 012
> 0000032
I'm getting this on:
> Linux tools-sgebastion-08 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27)
> x86_64 GNU/Linux
> uniq (GNU coreutils) 8.26
My MacOS 10.13.6 box gets it right:
> $ uniq -c x
> 1 "ⁿᵘˡˡ"
> 1 "ܥܝܪܐܩ"
- bug#38627: uniq -c gets wrong count with non-ascii strings,
Roy Smith <=