[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
join (2.0.1+) doesn't work properly
From: |
Christian Ohr |
Subject: |
join (2.0.1+) doesn't work properly |
Date: |
Wed, 12 Dec 2001 17:01:49 +0100 |
Hi,
recently I needed to merge some really big text files (several million
lines altogether, the result file was >1GB), joining on the first column
(which is a MD5 base64 hash key) and printing both unique lines and
concatenated joined lines. The command looked like:
join -a 1 -a 2 -t \t file1 file2 > file3
Since I had about 20 or more files to merge, I reduced their number by
subsequent pairwise joining until there was only one result file left
(would be nice to have something like multi-way merging available
here...).
However, joining only worked correctly with textutils 2.0a; later
versions (e.g. 2.0.13, 2.0.16) did leave some duplicate keys. Running
'diff' or 'wc -l' on the sorted and later unified keys identified some
100 lines difference...
I can reproduce this, however, I haven't identfied what goes wrong nor
did I dive into the sources ... maybe I can have a closer look at this
someday...
regards
Christian
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- join (2.0.1+) doesn't work properly,
Christian Ohr <=