qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A bug of Monitor Chardev ?


From: Peter Xu
Subject: Re: A bug of Monitor Chardev ?
Date: Fri, 21 May 2021 10:43:36 -0400

On Fri, May 21, 2021 at 09:25:52AM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Wed, May 19, 2021 at 08:17:51PM +0400, Marc-André Lureau wrote:
> >> Hi
> >> 
> >> On Mon, May 17, 2021 at 11:11 AM Longpeng (Mike, Cloud Infrastructure
> >> Service Product Dept.) <longpeng2@huawei.com> wrote:
> >> 
> >> > We find a race during QEMU starting, which would case the QEMU process
> >> > coredump.
> >> >
> >> > <main loop>                             |    <MON iothread>
> >> >                                         |
> >> > [1] create MON chardev                  |
> >> > qemu_create_early_backends              |
> >> >   chardev_init_func                     |
> >> >                                         |
> >> > [2] create MON iothread                 |
> >> > qemu_create_late_backends               |
> >> >   mon_init_func                         |
> >> >         aio_bh_schedule----------------------->
> >> > monitor_qmp_setup_handlers_bh
> >> > [3] enter main loog                     |    tcp_chr_update_read_handler
> >> > (* A client come in, e.g. Libvirt *)    |      update_ioc_handlers
> >> >
> >> tcp_chr_new_client                      |
> >> >   update_ioc_handlers                   |
> >> >                                         |
> >> >     [4] create new hup_source           |
> >> >         s->hup_source = *PTR1*          |
> >> >           g_source_attach(s->hup_source)|
> >> >                                         |        [5]
> >> > remove_hup_source(*PTR1*)
> >> >                                         |            (create new
> >> > hup_source)
> >> >                                         |             s->hup_source =
> >> > *PTR2*
> >> >         [6] g_source_attach_unlocked    |
> >> >               *PTR1* is freed by [5]    |
> >> >
> >> > Do you have any suggestion to fix this bug ? Thanks!
> >> >
> >> >
> >> I see.. I think the simplest would be for the chardev to not be dispatched
> >> in the original thread after monitor_init_qmp(). It looks like this should
> >> translate at least to calling qio_net_listener_set_client_func_full() with
> >> NULL handlers. I can't see where we could fit that in the chardev API.
> >> Perhaps add a new qemu_chr_be_disable_handlers() (until
> >> update_read_handlers is called again to enable them)?
> >> 
> >> Daniel? Paolo?
> >
> > IIUC, the problem is:
> >
> >   - when we first create the chardev, its IO watches are setup with
> >     the default (NULL) GMainContext which is processed by the main
> >     thread
> >
> >   - when we create the monitor, we re-initialize the chardev to
> >     attach its IO watches to a custom GMainCOntext associated with
> >     the monitor thread.
> >
> >   - The re-initialization is happening in a bottom half that runs
> >     in the monitor thread, thus the main thread can already start
> >     processing an IO event in parallel
> >
> > Looking at the code in qmp.c monitor_init_qmp method it has a
> > comment:
> >
> >         /*
> >          * We can't call qemu_chr_fe_set_handlers() directly here
> >          * since chardev might be running in the monitor I/O
> >          * thread.  Schedule a bottom half.
> >          */
> >
> > AFAICT, that comment is wrong. monitor_init_qmp is called from
> > monitor_init, which is called from monitor_init_opts, which is
> > called from qemu_create_late_backends, which runs in the main
> > thread.
> 
> Goes back to commit a5ed352596a8b7eb2f9acce34371b944ac3056c4
> Author: Peter Xu <peterx@redhat.com>
> Date:   Fri Mar 9 16:59:52 2018 +0800
> 
>     monitor: allow using IO thread for parsing
>     
>     For each Monitor, add one field "use_io_thr" to show whether it will be
>     using the dedicated monitor IO thread to handle input/output.  When set,
>     monitor IO parsing work will be offloaded to the dedicated monitor IO
>     thread, rather than the original main loop thread.
>     
>     This only works for QMP.  HMP will always be run on the main loop
>     thread.
>     
>     Currently we're still keeping use_io_thr off always.  Will turn it on
>     later at some point.
>     
>     One thing to mention is that we cannot set use_io_thr for every QMP
>     monitor.  The problem is that MUXed typed chardevs may not work well
>     with it now. When MUX is used, frontend of chardev can be the monitor
>     plus something else.  The only thing we know would be safe to be run
>     outside main thread so far is the monitor frontend. All the rest of the
>     frontends should still be run in main thread only.
>     
>     Signed-off-by: Peter Xu <peterx@redhat.com>
>     Message-Id: <20180309090006.10018-10-peterx@redhat.com>
>     Reviewed-by: Eric Blake <eblake@redhat.com>
>     [eblake: squash in Peter's followup patch to avoid test failures]
>     Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> Peter, do you remember why you went for a bottom half?
> 
> Hmm, back then it was in monitor_init(), which was called from several
> places.  Did we manage to lose the need for a bottom half along the way?
> 
> Note that the initial comment was a bit different:
> 
>         if (mon->use_io_thr) {
>             /*
>              * Make sure the old iowatch is gone.  It's possible when
>              * e.g. the chardev is in client mode, with wait=on.
>              */
>             remove_fd_in_watch(chr);
>             /*
>              * We can't call qemu_chr_fe_set_handlers() directly here
>              * since during the procedure the chardev will be active
>              * and running in monitor iothread, while we'll still do
>              * something before returning from it, which is a possible
>              * race too.  To avoid that, we just create a BH to setup
>              * the handlers.
>              */
>             aio_bh_schedule_oneshot(monitor_get_aio_context(),
>                                     monitor_qmp_setup_handlers_bh, mon);
>             /* We'll add this to mon_list in the BH when setup done */
>             return;
>         } else {
>             qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read,
>                                      monitor_qmp_read, monitor_qmp_event,
>                                      NULL, mon, NULL, true);
>         }
> 
> I changed it in commit 774a6b67a40.

I think the original problem was that if qemu_chr_fe_set_handlers() is called
in main thread, it can start to race somehow within execution of the function
qemu_chr_fe_set_handlers() right after we switch context at:

    qemu_chr_be_update_read_handlers(s, context);

Then the rest code in qemu_chr_fe_set_handlers() will continue to run in main
thread for sure, but the should be running with the new iothread context, which
introduce a race condition.

Running qemu_chr_be_update_read_handlers() in BH resolves that because then all
things run in the monitor iothread only and natually serialized.

So the new comment looks indeed not fully right, as the chr device should be
indeed within main thread context before qemu_chr_fe_set_handlers(), it's just
that the race may start right away if without BH when context switch happens
for the chr.

Thanks,

> 
> > I think we should explicitly document that monitor_init_qmp
> > is *required* to be invoked from the main thread and then
> > remove the bottom half usage.
> 
> Assert "running in main thread", so screwups crash reliably instead of
> creating a race.
> 
> >                                If we ever find a need to
> > create a new monitor from a non-main thread, that thread
> > could use an idle callback attached to the default GMainContext
> > to invoke monitor_init_qmp.
> >
> > Regards,
> > Daniel
> 

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]