[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
uniq Bug
From: |
Ryan Helinski |
Subject: |
uniq Bug |
Date: |
Tue, 27 Jun 2006 19:36:18 -0400 |
User-agent: |
Thunderbird 1.5.0.4 (Windows/20060516) |
Hello,
Not sure if this has already been discovered, but I found a problem with
uniq. If I sat down and looked a the code, I could probably see how to
fix it. It seems to always occur with very large unsorted streams (files).
Below are the commands I ran to exploit the bug (which I originally
thought was my error). Sorting the stream before removing duplicate
lines is inconsistent with just removing duplicate lines:
address@hidden srv]# find ./ -printf "%i\n" -type f > ./srv_inodes.txt
address@hidden srv]# cat srv_inodes.txt | wc -l
65678
address@hidden srv]# cat srv_inodes.txt | uniq | wc -l
65488
address@hidden srv]# less srv_inodes.txt
address@hidden srv]# cat srv_inodes.txt | sort | uniq | wc -l
57046
Note that srv_inodes.txt as generated is about 70 thousand inode
numbers. I've attached this file.
Let me know the status of this bug (or limitation),
Ryan Helinski
srv_inodes.zip
Description: Zip compressed data
- uniq Bug,
Ryan Helinski <=