bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GNU Parallel Bug Reports Bug in parallel using --pipe


From: Bill Wyatt
Subject: GNU Parallel Bug Reports Bug in parallel using --pipe
Date: Tue, 6 Sep 2011 20:09:44 -0400

I have a bug that I found in 20110722, and is also in the latest release.
It involves the use of --pipe, where it appears parallel is not at first
recognizing that stdin has been exhausted.

  mmtobs 1>  parallel --version
  GNU parallel 20110822
    [...]
  
  Dell PowerEdge R900, 8 cpus, 128 GB
  CentOS 5.6
  Linux tdc2 2.6.18-238.12.1.el5 #1 SMP \
    Tue May 31 13:22:04 EDT 2011        \
    x86_64 x86_64 x86_64 GNU/Linux


I have a file of MD5 checksums, in the usual output format of
md5sum(1).  I want to use parallel to split the large file into
pieces to run the checking function of "md5sum -c". But, md5sum is
sometimes outputting an error message that seems to mean it has been
called with no input, even though I supplied the "-r" argument.

The error is obvious if you have, say, a 1300-line file and make the
-L a larger number that that, and also allocate more than one cpu:

  mmtobs 0> wc -l < fire.md5.txt
  1300

  mmtobs 1> < fire.md5.txt parallel -r -j3 -L1500 --pipe md5sum -c >/dev/null
  md5sum: standard input: no properly formatted MD5 checksum lines found
  md5sum: standard input: no properly formatted MD5 checksum lines found

Here's a simple case: 3 lines or fewer, using 3 cpus and one line
per execution of md5sum (and note the output of command 4):

  mmtobs 2> head -3 fire.md5.txt | parallel -r -j3 -L 1 --pipe md5sum -c
  2011.0320/dif2.fits.bz2: OK
  2011.0320/diff.fits.bz2: OK
  2011.0320/fire_0001.fits.bz2: OK
  
  mmtobs 3> head -2 fire.md5.txt | parallel -r -j3 -L 1 --pipe md5sum -c
  md5sum: standard input: no properly formatted MD5 checksum lines found
  2011.0320/dif2.fits.bz2: OK
  2011.0320/diff.fits.bz2: OK
  
  mmtobs 4> head -1 fire.md5.txt | parallel -r -j3 -L 1 --pipe md5sum -c
  md5sum: standard input: no properly formatted MD5 checksum lines found
  md5sum: standard input: no properly formatted MD5 checksum lines found
  md5sum: standard input: no properly formatted MD5 checksum lines found
  md5sum: standard input: no properly formatted MD5 checksum lines found
  2011.0320/dif2.fits.bz2: OK
  
  mmtobs 5> head -1 fire.md5.txt | parallel -r -j3 -L 1 --pipe md5sum -c
  md5sum: standard input: no properly formatted MD5 checksum lines found
  md5sum: standard input: no properly formatted MD5 checksum lines found
  2011.0320/dif2.fits.bz2: OK

Note that when the number of cpus times the number of lines divides
evenly into the input number of lines, all is well. And yes, that
strange output of _4_ error lines at command 4 really does sometimes
come out. My tests seem to show that the buffering usually correct
with larger files, but I've sometimes, not always, had inconsistent
behavior with my normal-sized files.

Bill Wyatt (wyatt at cfa harvard edu)
   Smithsonian Astrophysical Observatory  (Cambridge, MA, USA)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]