qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thoughts on VM fence infrastructure


From: Felipe Franciosi
Subject: Re: Thoughts on VM fence infrastructure
Date: Mon, 30 Sep 2019 20:24:17 +0000


> On Sep 30, 2019, at 8:45 PM, Rafael David Tinoco <address@hidden> wrote:
> 
> 
>>>> There are times when the main loop can get blocked even though the CPU
>>>> threads can be running and can in some configurations perform IO
>>>> even without the main loop (I think!).
>>> Ah, that's a very good point. Indeed, you can perform IO in those
>>> cases specially when using vhost devices.
>>> 
>>>> By setting a timer in the kernel that sends a signal to qemu, the kernel
>>>> will send that signal however broken qemu is.
>>> Got you now. That's probably better. Do you reckon a signal is
>>> preferable over SIGEV_THREAD?
>> Not sure; probably the safest is getting the kernel to SIGKILL it - but
>> that's a complete nightmare to debug - your process just goes *pop*
>> with no apparent reason why.
>> I've not used SIGEV_THREAD - it looks promising though.
> 
> Sorry to "enter" the discussion, but, in "real" HW, its not by accident
> that watchdog devices timeout generates a NMI to CPUs, causing the
> kernel to handle the interrupt - and panic (or to take other action set
> by specific watchdog drivers that re-implements the default ones).

Not sure what you mean by "sorry"... thanks for joining. :)

> Can't you simple "inject" a NMI in all guest vCPUs BEFORE you take any
> action in QEMU itself? Just like the virtual watchdog device would do,
> from inside the guest (/dev/watchdog), but capable of being updated by
> outside, in this case of yours (if I understood correctly).

It's unclear to me how this relates to this use case, perhaps that's
not clear. The idea is that on various cloud deployments, a host could
be temporarily unavailable. Imagine that a network cable snapped. A
management layer could then restart the unreachable VMs elsewhere (as
part of High Availability offerings), but it needs to ensure that
disconnected host is not just going to come back from the dead with
older incarnations of the VMs running. (Imagine that someone replaced
the broken network cable.) That would result in lots of issues from
colliding IP addresses to different writers on shared storage leading
to data corruption.

The ask is for a mechanism to fence the host, essentially causing all
(or selected) VMs on that host to die. There are several mechanisms
for that, mostly requiring some sort of HW support (eg. STONITH).
Those are often focused on cases where the host requires manual
intervention to recover or at least a reset.

I'm looking to implement a mechanism for self-fencing, which doesn't
require external hardware and cover most failure scenarios (from
partially/totally broken hosts to simply a temporary network failure).
In several cases rebooting the host is unnecessary; just ensuring the
VMs are down is enough. That's almost always true on temporary network
unavailability (eg. split network).

> Possibly you would have to have a dedicated loop for this "watchdog
> device" (AIO threads ?) not to compete with existing coroutines/BH Tasks
> and their jittering on your "realtime watchdog needs".

Only when this feature is needed (which isn't the case for most
people), there would be an extra thread (according to the latest
proposal) which is mostly idle. It would wake up every few seconds and
stat() a file, which is a very lightweight operation. That would not
measurably impact/jitter other work.

> Regarding remaining existing I/OS for the guest's devices in question
> (vhost/vhost-user etc), would be just like a real host where the "bus"
> received commands, but sender died right after...

I hope the above clarifies the idea.

Cheers,
F.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]