qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thoughts on VM fence infrastructure


From: Felipe Franciosi
Subject: Re: Thoughts on VM fence infrastructure
Date: Mon, 30 Sep 2019 16:59:09 +0000


> On Sep 30, 2019, at 5:03 PM, Dr. David Alan Gilbert <address@hidden> wrote:
> 
> * Felipe Franciosi (address@hidden) wrote:
>> Hi David,
>> 
>>> On Sep 30, 2019, at 3:29 PM, Dr. David Alan Gilbert <address@hidden> wrote:
>>> 
>>> * Felipe Franciosi (address@hidden) wrote:
>>>> Heyall,
>>>> 
>>>> We have a use case where a host should self-fence (and all VMs should
>>>> die) if it doesn't hear back from a heartbeat within a certain time
>>>> period. Lots of ideas were floated around where libvirt could take
>>>> care of killing VMs or a separate service could do it. The concern
>>>> with those is that various failures could lead to _those_ services
>>>> being unavailable and the fencing wouldn't be enforced as it should.
>>>> 
>>>> Ultimately, it feels like Qemu should be responsible for this
>>>> heartbeat and exit (or execute a custom callback) on timeout.
>>> 
>>> It doesn't feel doing it inside qemu would be any safer;  something
>>> outside QEMU can forcibly emit a kill -9 and qemu *will* stop.
>> 
>> The argument above is that we would have to rely on this external
>> service being functional. Consider the case where the host is
>> dysfunctional, with this service perhaps crashed and a corrupt
>> filesystem preventing it from restarting. The VMs would never die.
> 
> Yeh that could fail.
> 
>> It feels like a Qemu timer-driven heartbeat check and calls abort() /
>> exit() would be more reliable. Thoughts?
> 
> OK, yes; perhaps using a timer_create and telling it to send a fatal
> signal is pretty solid; it would take the kernel to do that once it's
> set.

I'm confused about why the kernel needs to be involved. If this is a
timer off the Qemu main loop, it can just check on the heartbeat
condition (which should be customisable) and call abort() if that's
not satisfied. If you agree on that I'd like to talk about how that
check could be made customisable.

> 
> IMHO the safer way is to kick the host off the network by reprogramming
> switches; so even if the qemu is actually alive it can't get anywhere.
> 
> Dave

Naturally some off-host STONITH is preferable, but that's not always
available. A self-fencing mechanism right at the heart of the emulator
can do the job without external hardware dependencies.

Cheers,
Felipe

> 
> 
>> Felipe
>> 
>>> 
>>>> Does something already exist for this purpose which could be used?
>>>> Would a generic Qemu-fencing infrastructure be something of interest?
>>> Dave
>>> 
>>> 
>>>> Cheers,
>>>> F.
>>>> 
>>> --
>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
>> 
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]