[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
possible issues with --retries option
From: |
Nick Felt |
Subject: |
possible issues with --retries option |
Date: |
Wed, 15 Dec 2010 20:21:14 -0500 |
Hello,
I've been having some difficulty trying to get parallel to work consistently in distributing jobs across a bunch of remote machines on the local network. My goal is to have parallel run a script remotely that exits with error code 1 if the machine is occupied, prompting parallel (with --retries set) to run the job on a different machine. I'm not sure if the behavior I'm seeing is actually a bug, or intentional for reasons I don't understand. But either way, advice would be appreciated!
The main problem is that when I run this command (machine names are "nutmeg" and "vinegar" as remotes):
$ seq 1 8 | parallel --retries 10 --sshlogin 8/nutmeg,8/vinegar -j+0 "hostname; false"
I get no output. This kind of makes sense, because parallel presumably retries each of the 8 jobs 10 times, always encounters and error, and gives up (albeit silently). I can get it to work by (a) removing --retries and (b) changing "false" to "true":
$ seq 1 8 | parallel --sshlogin 8/nutmeg,8/vinegar -j+0 "hostname; false"
vinegar
[... x8 total]
$ seq 1 8 | parallel --retries 10 --sshlogin 8/nutmeg,8/vinegar -j+0 "hostname; true"
vinegar
[... x8 total]
That also makes sense. What I'm having trouble understanding is why two other things also make it work: (c) removing the '-j+0' setting, and (d) - most perplexingly - changing the input to be 9 lines (or equivalently, reducing the 'ncpu' value for vinegar to 7):
$ seq 1 8 | parallel --retries 10 --sshlogin 8/nutmeg,8/vinegar "hostname; false"
nutmeg
[... x8 total]
$ seq 1 9 | parallel --retries 10 --sshlogin 8/nutmeg,8/vinegar -j+0 "hostname; false"
nutmeg
nutmeg
[... x9 total]
Furthermore, when I increase the input to 16 lines, I get an even mix of "nutmeg" and "vinegar" (9 lines always seems to produce "nutmeg" only) and it also seems to print out faster:
$ seq 1 16 | parallel --retries 10 --sshlogin 8/nutmeg,8/vinegar -j+0 "hostname; false"
vinegar
nutmeg
nutmeg
vinegar
vinegar
nutmeg
vinegar
nutmeg
nutmeg
vinegar
nutmeg
vinegar
vinegar
nutmeg
nutmeg
nutmeg
I don't know how the --retries option works internally, but I'd hazard a guess that it's somehow responsible for the variance I'm seeing. Could someone what's going on here (and whether it's supposed to be working like this)?
Thanks!
- Nick Felt
- possible issues with --retries option,
Nick Felt <=