> There are only 2 records, but it finds lots of extra records of size zero.
See: http://savannah.gnu.org/bugs/?34241
I think there's another issue. In the code where it says "Find the last recend-recstart in $buf", if $buf only has a partial record, and if recend is empty, then it will find the first recstart. So it gets a size-zero record before the first recstart.
> Related to this, parallel seems to become slow when the records are much
> bigger than the block size.
That is likely: Parallel read a chunk of size block-size at a time. If
it cannot find a single record in that it will have to append yet
another block of the same size. This will give performance of O(n^2).
Here is a patch that does 2 things:
* Fixes the recend-recstart issue.
* Whenever it reads less than one record (or N records), it doubles the block-size (else it resets the block-size). This improves the performance from O(n^2) to O(n).
Another possibility (if regexps are not used) is to do something like this:
local $/ = $recendrecstart;
substr($buf,length $buf,0) = <$in>;
# read more characters up to the next multiple of block-size
Perhaps --pipe with -N (and/or -L?) would be faster if it read like this without using blocks at all?
Have a nice day,
Martin
P.S. Thanks for providing this extremely useful tool!