bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports Feature request: if a SSH node goes down, r


From: Ole Tange
Subject: Re: GNU Parallel Bug Reports Feature request: if a SSH node goes down, retry on other nodes
Date: Tue, 15 Dec 2015 00:20:58 +0100

On Sun, Dec 13, 2015 at 2:48 PM, Nazgul <address@hidden> wrote:
> On 14 December 2015 at 00:20, Ole Tange <address@hidden> wrote:
>>
>> On Thu, Dec 10, 2015 at 11:17 PM, Nazgul <address@hidden> wrote:
>>
>> > I am using GNU Parallel with --sshlogin on unreliable nodes - that is,
>> > some
>> > of them become unreachable after an unpredictable amount of time.
:
>> > It would be nice to have a feature so that, instead, remaining threads
>> > are
>> > sent to the machines that are still available.
>>
>> Did you try --retries and --filterhosts?
:
> It seems --filter-hosts is a good candidate. However I have two doubts:
>
> Is this a check performed before the distributed executions or is this a
> policy active throughout the whole life-time of the Parallel process? This
> makes a difference if the node fails after the check.
> If a node fails while executing a command, is that command re-executed on a
> still active node?

>From the man page:

       --retries n
                If a job fails, retry it on another computer on
                which it has not failed. Do this n times. If
                there are fewer than n computers in --sshlogin
                GNU parallel will re-use all the computers.
                This is useful if some jobs fail for no
                apparent reason (such as network failure).

It is fairly expensive do filter hosts. So it is only done if
--sshloginfile is changed. You can force that by touching
--sshloginfile every time you want a filtering to be run.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]