[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: there is a bug with UNIX command join

From: Robert Wolf
Subject: RE: there is a bug with UNIX command join
Date: Tue, 24 Jun 2003 10:17:44 -0400

Sorry, I thought outlook would have attached the files when replying.
Try the commands listed below with the attached files.
There should be 4 files: j1 j2 s1 s2

I apologize in advance, and actually I hope I am wrong I love Unix but as a
contractor I have
to adapt to the customer's environment. Thanks to you guys for CYGWIN, it
makes a PC easily programmable again. I have to use g++ because VC++ is so
far behind in compliant C++ and STL.

-----Original Message-----
From: address@hidden [mailto:address@hidden
Sent: June 24, 2003 12:40 AM
To: Robert Wolf
Cc: address@hidden
Subject: Re: there is a bug with UNIX command join

Robert Wolf wrote:
> Thanks for replying so quickly. 

Thanks for submitting your bug report.

> I tried
> $ join -t  \012  -v 2 j1 j2
> $ join -t '\012' -v 2 j1 j2
> $ join -t "\012" -v 2 j1 j2
> All three versions are doing the same wrong thing, they are including the
> 'eee' line, which is the last line
> of the file j1 and a middle line of j2, when it should not include this
> 'eee' line.

I don't know what is in your data files.  You did not share those with
us.  If you have a small, emphasis on small, test case perhaps you
could share it with the list?  I tried creating an example which would
illustrate your problem but could not reproduce any trouble.

I only rarely use join myself and then only in the most basic of
ways.  Therefore I am sorry but I am unable to guess further at what
might be your trouble.

> I also tried
> $ comm -13 j1 j2
> However it does the same wrong thing as the previous three, again
> the 'eee' line. I also suspect it might have something to do with the
> line being the last line of the first file.

Since there are two different programs which are both behaving the
same I hazard a guess that it is probably a misunderstanding of the
behavior of them and not an actual bug.

> I tried one thing, I created two other test files with much shorter lines
> and tried all four commands.
> $ join -t  \012  -v 2 s1 s2
> $ join -t '\012' -v 2 s1 s2
> $ join -t "\012" -v 2 s1 s2
> $ comm -13 s1 s2
> And guess what, it? They all worked! I believe the problem is the length
> the lines in the files j1 and j2 which are almost 300 characters is too
> for these commands to handle.
> It would be nice if these commands, i.e. join & comm, could handle much
> longer lines, say a default of 4096 characters, and a new option to
> a larger line size say up to 65535 characters. Another question is can
> programs handle files that 30 MB in size and long lines?

The join command should not have such small limitations such as you
are describing.  At least not intentionally.  Internally they malloc
memory and should be able to handle very long lines.  You may not be
aware but it is a GNU standards guideline that programs not have
arbitrary limits such as this.  As much as possible the coreutils
follow that guideline.  If there is a line length problem here then
that would be a bug to be fixed in the program.  I am still not
convinced yet, however.  Especially since you report that 'comm' has
the same result.  I also browsed the code and could not locate
anything that looked like a problem here.

Can you send a small test case that would allow us to recreate your


reply via email to

[Prev in Thread] Current Thread [Next in Thread]