bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

possible issues with --retries option


From: Nick Felt
Subject: possible issues with --retries option
Date: Wed, 15 Dec 2010 20:21:14 -0500

Hello,

I've been having some difficulty trying to get parallel to work consistently in distributing jobs across a bunch of remote machines on the local network.  My goal is to have parallel run a script remotely that exits with error code 1 if the machine is occupied, prompting parallel (with --retries set) to run the job on a different machine.  I'm not sure if the behavior I'm seeing is actually a bug, or intentional for reasons I don't understand.  But either way, advice would be appreciated!

The main problem is that when I run this command (machine names are "nutmeg" and "vinegar" as remotes):

$ seq 1 8 | parallel --retries 10 --sshlogin 8/nutmeg,8/vinegar -j+0 "hostname; false"

I get no output.  This kind of makes sense, because parallel presumably retries each of the 8 jobs 10 times, always encounters and error, and gives up (albeit silently).  I can get it to work by (a) removing --retries and (b) changing "false" to "true":

$ seq 1 8 | parallel --sshlogin 8/nutmeg,8/vinegar -j+0 "hostname; false"
vinegar
vinegar
[... x8 total]

$ seq 1 8 | parallel --retries 10 --sshlogin 8/nutmeg,8/vinegar -j+0 "hostname; true"
vinegar
vinegar
[... x8 total]

That also makes sense.  What I'm having trouble understanding is why two other things also make it work: (c) removing the '-j+0' setting, and (d) - most perplexingly - changing the input to be 9 lines (or equivalently, reducing the 'ncpu' value for vinegar to 7):

$ seq 1 8 | parallel --retries 10 --sshlogin 8/nutmeg,8/vinegar "hostname; false"
nutmeg
nutmeg
[... x8 total]

$ seq 1 9 | parallel --retries 10 --sshlogin 8/nutmeg,8/vinegar -j+0 "hostname; false"
nutmeg
nutmeg
[... x9 total]

Furthermore, when I increase the input to 16 lines, I get an even mix of "nutmeg" and "vinegar" (9 lines always seems to produce "nutmeg" only) and it also seems to print out faster:

$ seq 1 16 | parallel --retries 10 --sshlogin 8/nutmeg,8/vinegar -j+0 "hostname; false"
vinegar
nutmeg
nutmeg
vinegar
vinegar
nutmeg
vinegar
nutmeg
nutmeg
vinegar
nutmeg
vinegar
vinegar
nutmeg
nutmeg
nutmeg

I don't know how the --retries option works internally, but I'd hazard a guess that it's somehow responsible for the variance I'm seeing.  Could someone what's going on here (and whether it's supposed to be working like this)?

Thanks!

- Nick Felt


reply via email to

[Prev in Thread] Current Thread [Next in Thread]