qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging


From: Lukas Straub
Subject: Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu
Date: Mon, 11 May 2020 20:12:18 +0200

On Mon, 11 May 2020 12:49:47 +0100
Daniel P. Berrangé <address@hidden> wrote:

> On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > Hello Everyone,
> > In many cases, if qemu has a network connection (qmp, migration, chardev, 
> > etc.)
> > to some other server and that server dies or hangs, qemu hangs too.  
> 
> If qemu as a whole hangs due to a stalled network connection, that is a
> bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> I/O in general, such that if the network connection or remote server stalls,
> we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> loop.
> 
> There are places in QEMU code which are not well behaved in this respect,
> but many are, and others are getting fixed where found to be important.
> 
> Arguably any place in QEMU code which can result in a hang of QEMU in the
> event of a stalled network should be considered a security flaw, because
> the network is untrusted in general.

The fact that out-of-band qmp commands exist at all shows that we have to make 
tradeoffs of developer time vs. doing things right. Sure, the migration code 
can be rewritten to use non-blocking i/o and finegrained locks. But as a 
hobbyist I don't have time to fix this.

> > These patches introduce the new 'yank' out-of-band qmp command to recover 
> > from
> > these kinds of hangs. The different subsystems register callbacks which get
> > executed with the yank command. For example the callback can shutdown() a
> > socket. This is intended for the colo use-case, but it can be used for other
> > things too of course.  
> 
> IIUC, invoking the "yank" command unconditionally kills every single
> network connection in QEMU that has registered with the "yank" subsystem.
> IMHO this is way too big of a hammer, even if we accept there are bugs in
> QEMU not handling stalled networking well.
> 
> eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> connection used for the guest disk, we needlessly break I/O.

Yeah, these patches are intended to solve the problems with the colo use-case 
where all external connections (migration, chardevs, nbd) are just for 
replication. In other use-cases you'd enable the yank feature only on the 
non-essential connections.

> eg doing this in the chardev backend is not desirable, because the bugs
> with hanging QEMU are typically caused by the way the frontend device
> uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
> 
> 
> Regards,
> Daniel

Attachment: pgpLdwDrO8gdy.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]