qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB


From: Stefan Reiter
Subject: Re: [PATCH] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB
Date: Mon, 29 Mar 2021 11:49:06 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0

On 3/26/21 3:48 PM, Markus Armbruster wrote:
Wolfgang Bumiller <w.bumiller@proxmox.com> writes:

On Thu, Mar 18, 2021 at 02:35:50PM +0100, Stefan Reiter wrote:
If OOB is disabled, events received in monitor_qmp_event will be handled
in the main context. Thus, we must not acquire a qmp_queue_lock there,
as the dispatcher coroutine holds one over a yield point, where it
expects to be rescheduled from the main context. If a CHR_EVENT_CLOSED
event is received just then, it can race and block the main thread by
waiting on the queue lock.

Run monitor_qmp_cleanup_queue_and_resume in a BH on the iohandler
thread, so the main thread can always make progress during the
reschedule.

The delaying of the cleanup is safe, since the dispatcher always moves
back to the iothread afterward, and thus the cleanup will happen before
it gets to its next iteration.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

This is a tough one. It *may* be fine, but I wonder if we can approach
this differently:

You guys make my head hurt.


That makes three of us, I think.

I understand we're talking about a bug.  Is it a recent regression, or
an older bug?  How badly does the bug affect users?


It's a regression introduced with the coroutinization of QMP in 5.2. It only occurs when OOB is disabled - in our downstream we have it disabled unconditionally, as it caused some issues in the past.

It affected quite a lot of our users, some when the host was under CPU load, some seemingly random. When it happened it usually hit multiple VMs at once, completely hanging them.

Just for reference, our forum has the full story:
https://forum.proxmox.com/threads/all-vms-locking-up-after-latest-pve-update.85397/

I'm about to vanish for my Easter break...  If the bug must be fixed for
6.0, just waiting for me to come back seems unadvisable.


Since it doesn't happen with OOB (so, by default), I don't think it's that urgent.

BTW, I've sent a v2 as well:
https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg07590.html

That one we have shipped to our users for now, with mostly good success, though a few reports that something still hangs - which could be people failing to upgrade, or some other issue still unsolved...

And happy easter break :)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]