[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: acceptance-system-fedora failures
From: |
Philippe Mathieu-Daudé |
Subject: |
Re: acceptance-system-fedora failures |
Date: |
Thu, 8 Oct 2020 12:26:27 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 |
On 10/7/20 3:11 PM, Pavel Dovgalyuk wrote:
> On 07.10.2020 15:49, Philippe Mathieu-Daudé wrote:
>> On 10/7/20 2:20 PM, Pavel Dovgalyuk wrote:
>>> On 07.10.2020 14:22, Alex Bennée wrote:
>>>>
>>>> Philippe Mathieu-Daudé <philmd@redhat.com> writes:
>>>>
>>>>> On 10/7/20 10:51 AM, Pavel Dovgalyuk wrote:
>>>>>> On 07.10.2020 11:23, Thomas Huth wrote:
>>>>>>> On 07/10/2020 09.13, Philippe Mathieu-Daudé wrote:
>>>>>>> Thanks, that was helpful. ... and the winner is:
>>>>>>>
>>>>>>> commit 55adb3c45620c31f29978f209e2a44a08d34e2da
>>>>>>> Author: John Snow <jsnow@redhat.com>
>>>>>>> Date: Fri Jul 24 01:23:00 2020 -0400
>>>>>>> Subject: ide: cancel pending callbacks on SRST
>>>>>>>
>>>>>>> ... starting with this commit, the tests starts failing. John, any
>>>>>>> idea what
>>>>>>> might be causing this?
>>>>>>
>>>>>> This patch includes the following lines:
>>>>>>
>>>>>> + aio_bh_schedule_oneshot(qemu_get_aio_context(),
>>>>>> + ide_bus_perform_srst, bus);
>>>>>>
>>>>>> replay_bh_schedule_oneshot_event should be used instead of this
>>>>>> function, because it synchronizes non-deterministic BHs.
>>>>>
>>>>> Why do we have 2 different functions? BH are already complex
>>>>> enough, and we need to also think about the replay API...
>>>>>
>>>>> What about the other cases such vhost-user (blk/net), virtio-blk?
>>>>
>>>> This does seem like something that should be wrapped up inside
>>>> aio_bh_schedule_oneshot itself or maybe we need a
>>>> aio_bh_schedule_transaction_oneshot to distinguish it from the other
>>>> uses the function has.
>>>>
>>>
>>> Maybe there should be two functions:
>>> - one for the guest modification
>>
>> aio_bh_schedule_oneshot_deterministic()?
>>
>>> - one for internal qemu things
>>
>> Not sure why there is a difference, BH are used to
>> avoid delaying the guest, so there always something
>> related to "guest modification".
>
> Not exactly. At least there is one non-related-to-guest case
> in monitor_init_qmp:
> /*
> * We can't call qemu_chr_fe_set_handlers() directly here
> * since chardev might be running in the monitor I/O
> * thread. Schedule a bottom half.
> */
> aio_bh_schedule_oneshot(iothread_get_aio_context(mon_iothread),
> monitor_qmp_setup_handlers_bh, mon);
I don't understand the documentation in docs/devel/replay.txt:
---
Bottom halves
=============
Bottom half callbacks, that affect the guest state, should be invoked
through
replay_bh_schedule_event or replay_bh_schedule_oneshot_event functions.
Their invocations are saved in record mode and synchronized with the
existing
log in replay mode.
---
But then it is only used in block drivers, which are not
related to guest state:
$ git grep replay_bh_schedule_oneshot_event
block/block-backend.c:1385:
replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
block/block-backend.c:1450:
replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
block/io.c:371:
replay_bh_schedule_oneshot_event(bdrv_get_aio_context(bs),
block/iscsi.c:286:
replay_bh_schedule_oneshot_event(iTask->iscsilun->aio_context,
block/nfs.c:262:
replay_bh_schedule_oneshot_event(task->client->aio_context,
block/null.c:183:
replay_bh_schedule_oneshot_event(bdrv_get_aio_context(bs),
block/nvme.c:323: replay_bh_schedule_oneshot_event(q->aio_context,
block/nvme.c:1075: replay_bh_schedule_oneshot_event(data->ctx,
nvme_rw_cb_bh, data);
block/rbd.c:865:
replay_bh_schedule_oneshot_event(bdrv_get_aio_context(acb->common.bs),
docs/devel/replay.txt:25:replay_bh_schedule_event or
replay_bh_schedule_oneshot_event functions.
include/sysemu/replay.h:178:void
replay_bh_schedule_oneshot_event(AioContext *ctx,
replay/replay-events.c:141:void
replay_bh_schedule_oneshot_event(AioContext *ctx,
stubs/replay-user.c:5:void replay_bh_schedule_oneshot_event(AioContext *ctx,
Is replay_bh_schedule_oneshot_event ever used by replay?
Maybe we can remove it and use aio_bh_schedule_oneshot()
in place?
Else the documentation need to be clarified please.
>
>
>>
>>>
>>> The first one may be implemented though the rr+second one.
>>> Maybe replay_ prefix is confusing and the whole BH interface should look
>>> like that?
>>
>> Yes, but it would be safer/clearer if we don't need to use
>> a replay_ API.
>>
>> Can we embed these functions?
>>
>> - replay_bh_schedule_event
>> - replay_bh_schedule_oneshot_event
>>
>> If compiled without rr, events_enabled=false and
>> compiler can optimize:
>>
>> -- >8 --
>> diff --git a/util/async.c b/util/async.c
>> index f758354c6a..376b6d4e27 100644
>> --- a/util/async.c
>> +++ b/util/async.c
>> @@ -109,14 +109,19 @@ static QEMUBH *aio_bh_dequeue(BHList *head,
>> unsigned *flags)
>>
>> void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void
>> *opaque)
>> {
>> - QEMUBH *bh;
>> - bh = g_new(QEMUBH, 1);
>> - *bh = (QEMUBH){
>> - .ctx = ctx,
>> - .cb = cb,
>> - .opaque = opaque,
>> - };
>> - aio_bh_enqueue(bh, BH_SCHEDULED | BH_ONESHOT);
>> + if (events_enabled) {
>> + replay_add_event(REPLAY_ASYNC_EVENT_BH_ONESHOT, cb,
>> + opaque, replay_get_current_icount());
>> + } else {
>> + QEMUBH *bh;
>> + bh = g_new(QEMUBH, 1);
>> + *bh = (QEMUBH){
>> + .ctx = ctx,
>> + .cb = cb,
>> + .opaque = opaque,
>> + };
>> + aio_bh_enqueue(bh, BH_SCHEDULED | BH_ONESHOT);
>> + }
>> }
>>
>> QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
>> @@ -178,7 +183,12 @@ void qemu_bh_schedule_idle(QEMUBH *bh)
>>
>> void qemu_bh_schedule(QEMUBH *bh)
>
> qemu_bh_schedule is even worse.
> Many modules use it (e.g., loadvm_postcopy_handle_run), and there is no
> need to use replay version there. In such cases QEMU will halt if trying
> to call replay_bh_schedule_event.
>
>> {
>> - aio_bh_enqueue(bh, BH_SCHEDULED);
>> + if (events_enabled) {
>> + replay_add_event(REPLAY_ASYNC_EVENT_BH, bh, NULL,
>> + replay_get_current_icount());
>> + } else {
>> + aio_bh_enqueue(bh, BH_SCHEDULED);
>> + }
>> }
>>
>
- acceptance-system-fedora failures, John Snow, 2020/10/06
- Re: acceptance-system-fedora failures, Philippe Mathieu-Daudé, 2020/10/07
- Re: acceptance-system-fedora failures, Philippe Mathieu-Daudé, 2020/10/07
- Re: acceptance-system-fedora failures, Thomas Huth, 2020/10/07
- Re: acceptance-system-fedora failures, Pavel Dovgalyuk, 2020/10/07
- Re: acceptance-system-fedora failures, Philippe Mathieu-Daudé, 2020/10/07
- Re: acceptance-system-fedora failures, Alex Bennée, 2020/10/07
- Re: acceptance-system-fedora failures, Pavel Dovgalyuk, 2020/10/07
- Re: acceptance-system-fedora failures, Philippe Mathieu-Daudé, 2020/10/07
- Re: acceptance-system-fedora failures, Pavel Dovgalyuk, 2020/10/07
- Re: acceptance-system-fedora failures,
Philippe Mathieu-Daudé <=
- Re: acceptance-system-fedora failures, Kevin Wolf, 2020/10/08
- Re: acceptance-system-fedora failures, Pavel Dovgalyuk, 2020/10/09
- Re: acceptance-system-fedora failures, Philippe Mathieu-Daudé, 2020/10/13
- Re: acceptance-system-fedora failures, Pavel Dovgalyuk, 2020/10/07
- Re: acceptance-system-fedora failures, John Snow, 2020/10/07
Re: acceptance-system-fedora failures, Thomas Huth, 2020/10/07
Re: acceptance-system-fedora failures, Cleber Rosa, 2020/10/07