[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu |
Date: |
Mon, 11 May 2020 13:07:18 +0100 |
User-agent: |
Mutt/1.13.4 (2020-02-15) |
* Daniel P. Berrangé (address@hidden) wrote:
> On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > Hello Everyone,
> > In many cases, if qemu has a network connection (qmp, migration, chardev,
> > etc.)
> > to some other server and that server dies or hangs, qemu hangs too.
>
> If qemu as a whole hangs due to a stalled network connection, that is a
> bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> I/O in general, such that if the network connection or remote server stalls,
> we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> loop.
>
> There are places in QEMU code which are not well behaved in this respect,
> but many are, and others are getting fixed where found to be important.
>
> Arguably any place in QEMU code which can result in a hang of QEMU in the
> event of a stalled network should be considered a security flaw, because
> the network is untrusted in general.
That's not really true of the 'management network' - people trust that
and I don't see a lot of the qemu code getting fixed safely for all of
them.
> > These patches introduce the new 'yank' out-of-band qmp command to recover
> > from
> > these kinds of hangs. The different subsystems register callbacks which get
> > executed with the yank command. For example the callback can shutdown() a
> > socket. This is intended for the colo use-case, but it can be used for other
> > things too of course.
>
> IIUC, invoking the "yank" command unconditionally kills every single
> network connection in QEMU that has registered with the "yank" subsystem.
> IMHO this is way too big of a hammer, even if we accept there are bugs in
> QEMU not handling stalled networking well.
But isn't this hammer conditional - I see that it's a migration
capabiltiy for the migration socket, and a flag in nbd - so it only
yanks things you've told it to.
> eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> connection used for the guest disk, we needlessly break I/O.
>
> eg doing this in the chardev backend is not desirable, because the bugs
> with hanging QEMU are typically caused by the way the frontend device
> uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
>
Having a way to get out of any of these problems from a single point is
quite nice. To be useful in COLO you need to know for sure you can get
out of any network screwup.
We already use shutdown(2) in migrate_cancel and migrate-pause for
basically the same reason; I don't think we've got anything similar for
NBD, and we probably should have (I think I asked for it fairly
recently).
Dave
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- Re: [PATCH 3/5] block/nbd.c: Add yank feature, (continued)
- Re: [PATCH 3/5] block/nbd.c: Add yank feature, Lukas Straub, 2020/05/15
- Re: [PATCH 3/5] block/nbd.c: Add yank feature, Daniel P . Berrangé, 2020/05/15
- Re: [PATCH 3/5] block/nbd.c: Add yank feature, Lukas Straub, 2020/05/15
- Re: [PATCH 3/5] block/nbd.c: Add yank feature, Daniel P . Berrangé, 2020/05/15
- Re: [PATCH 3/5] block/nbd.c: Add yank feature, Lukas Straub, 2020/05/15
- Re: [PATCH 3/5] block/nbd.c: Add yank feature, Daniel P . Berrangé, 2020/05/15
[PATCH 4/5] chardev/char-socket.c: Add yank feature, Lukas Straub, 2020/05/11
[PATCH 5/5] migration: Add yank feature, Lukas Straub, 2020/05/11
Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Daniel P . Berrangé, 2020/05/11
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu,
Dr. David Alan Gilbert <=
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Daniel P . Berrangé, 2020/05/11
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Dr. David Alan Gilbert, 2020/05/11
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Lukas Straub, 2020/05/12
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Daniel P . Berrangé, 2020/05/12
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Dr. David Alan Gilbert, 2020/05/12
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Daniel P . Berrangé, 2020/05/13
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Lukas Straub, 2020/05/12
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Kevin Wolf, 2020/05/13
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Dr. David Alan Gilbert, 2020/05/13
- Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu, Kevin Wolf, 2020/05/13