bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports recstart finds spurious empty records


From: Martin Frith
Subject: Re: GNU Parallel Bug Reports recstart finds spurious empty records
Date: Thu, 20 Sep 2012 15:50:54 +0900

Hi Ole,

> There are only 2 records, but it finds lots of extra records of size zero.

See: http://savannah.gnu.org/bugs/?34241

I think there's another issue.  In the code where it says "Find the last recend-recstart in $buf", if $buf only has a partial record, and if recend is empty, then it will find the first recstart.  So it gets a size-zero record before the first recstart.


> Related to this, parallel seems to become slow when the records are much
> bigger than the block size.

That is likely: Parallel read a chunk of size block-size at a time. If
it cannot find a single record in that it will have to append yet
another block of the same size. This will give performance of O(n^2).

 Here is a patch that does 2 things:

* Fixes the recend-recstart issue.

* Whenever it reads less than one record (or N records), it doubles the block-size (else it resets the block-size).  This improves the performance from O(n^2) to O(n).

Another possibility (if regexps are not used) is to do something like this:

local $/ = $recendrecstart;
substr($buf,length $buf,0) = <$in>;
# read more characters up to the next multiple of block-size

Perhaps --pipe with -N (and/or -L?) would be faster if it read like this without using blocks at all?

Have a nice day,
Martin
P.S. Thanks for providing this extremely useful tool!

Attachment: parallel.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]