classpath-inetlib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Classpath-inetlib] Problems in gnu.inet.util.LineInputStream and gnu.in


From: Robert Mitchell
Subject: [Classpath-inetlib] Problems in gnu.inet.util.LineInputStream and gnu.inet.util.CRLFInputStream
Date: Thu, 31 Mar 2005 17:46:06 -0500

The readLine method for LineInputStream contains the following code:
 
            len = in.available();
            len = (len < MIN_LENGTH) ? MIN_LENGTH : len;
 
This has the effect of reading the entire wrapped stream in order to find one line.  This in itself would not be a major problem, except that LineInputStream often wraps a CRLFInputStream, whose algorithm for removing CRs is very inefficient when there are a large number of CRs to remove.  The test where I discovered these problems was using a ByteArrayInputStream containing over 1000 lines, each ending in CRLF.  It took on the order of a minute to read just the first few lines.
 
In addition, there is a problem in CRLFInputStream, where if the last character read in a buffer is a CR and there are more characters to read, the read(byte[], int, int) does not work correctly and will drop a character.
 
I suggest two changes.  First, the above lines should be replaced by:
 
            len = in.available();
            len = (len > MIN_LENGTH) ? MIN_LENGTH : len;
Second, I am attaching a replacement for CRLFInputStream that corrects both the inefficiencies of removeCRLF and the problems with read(byte[], int, int).  The first is fixed by making sure to only copy bytes once to remove CR, instead of copying them for each CR removed ahead of them.  The second is fixed by passing the correct length to removeCRLF.  To avoid an infinite loop problem with a CR at the end of a buffer, it reads the next character instead of buffering the CR.  To avoid possible problems with mark/reset caused by this, it adds one to the requested readahead limit.  (Note this does not deal with potential problems with interactions between the single byte read() - which can read 2 bytes - and mark/reset.)
 
After making these changes, the same test took less than a second.
 
Bob Mitchell.

Attachment: CRLFInputStream.java
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]