[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V1 2/3] migration: fix suspended runstate
From: |
Steven Sistare |
Subject: |
Re: [PATCH V1 2/3] migration: fix suspended runstate |
Date: |
Thu, 24 Aug 2023 16:53:13 -0400 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 |
On 8/17/2023 2:19 PM, Peter Xu wrote:
> On Wed, Aug 16, 2023 at 01:48:13PM -0400, Steven Sistare wrote:
>> On 8/14/2023 3:37 PM, Peter Xu wrote:
>>> On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
>>>>> Can we just call vm_state_notify() earlier?
>>>>
>>>> We cannot. The guest is not running yet, and will not be until later.
>>>> We cannot call notifiers that perform actions that complete, or react to,
>>>> the guest entering a running state.
>>>
>>> I tried to look at a few examples of the notifees and most of them I read
>>> do not react to "vcpu running" but "vm running" (in which case I think
>>> "suspended" mode falls into "vm running" case); most of them won't care on
>>> the RunState parameter passed in, but only the bool "running".
>>>
>>> In reality, when running=true, it must be RUNNING so far.
>>>
>>> In that case does it mean we should notify right after the switchover,
>>> since after migration the vm is indeed running only if the vcpus are not
>>> during suspend?
>>
>> I cannot parse your question, but maybe this answers it.
>> If the outgoing VM is running and not suspended, then the incoming side
>> tests for autostart==true and calls vm_start, which calls the notifiers,
>> right after the switchover.
>
> I meant IMHO SUSPENDED should be seen as "vm running" case to me, just like
> RUNNING. Then, we should invoke vm_prepare_start(), just need some touch
> ups.
>
>>
>>> One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
>>> try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
>>> device; this kind of prove to me that SUSPEND is actually one of
>>> running=true states.
>>>
>>> If we postpone all notifiers here even after we switched over to dest qemu
>>> to the next upcoming suspend wakeup, I think it means these devices will
>>> not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
>>> VFIO_DEVICE_STATE_STOP.
>>
>> or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup.
>> AFAIK it is OK to remain in that state until wakeup is called later.
>
> So let me provide another clue of why I think we should call
> vm_prepare_start()..
>
> Firstly, I think RESUME event should always be there right after we
> switched over, no matter suspeneded or not. I just noticed that your test
> case would work because you put "wakeup" to be before RESUME. I'm not sure
> whether that's correct. I'd bet people could rely on that RESUME to
> identify the switchover.
Yes, possibly.
> More importantly, I'm wondering whether RTC should still be running during
> the suspended mode? Sorry again if my knowledge over there is just
> limited, so correct me otherwise - but my understanding is during suspend
> mode (s1 or s3, frankly I can't tell which one this belongs..), rtc should
> still be running along with the system clock. It means we _should_ at
> least call cpu_enable_ticks() to enable rtc:
Agreed. The comment above cpu_get_ticks says:
* return the time elapsed in VM between vm_start and vm_stop
Suspending a guest does not call vm_stop, so ticks keeps running.
I also verified that experimentally.
> /*
> * enable cpu_get_ticks()
> * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
> */
> void cpu_enable_ticks(void)
>
> I think that'll enable cpu_get_tsc() and make it start to work right.
>
>>
>>> Ideally I think we should here call vm_state_notify() with running=true and
>>> state=SUSPEND, but since I do see some hooks are not well prepared for
>>> SUSPEND over running=true, I'd think we should on the safe side call
>>> vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
>>> over phase. With that IIUC it'll naturally work (e.g. when wakeup again
>>> later we just need to call no notifiers).
>>
>> Notifiers are just one piece, all the code in vm_prepare_start must be
>> called.
>> Is it correct to call all of that long before we actually resume the CPUs in
>> wakeup? I don't know, but what is the point?
>
> The point is not only for cleaness (again, I really, really don't like that
> new global.. sorry), but also now I think we should make the vm running.
I do believe it is safe to call vm_prepare_start immediately, since the vcpus
are never running when it is called.
>> The wakeup code still needs
>> modification to conditionally resume the vcpus. The scheme would be roughly:
>>
>> loadvm_postcopy_handle_run_bh()
>> runstat = global_state_get_runstate();
>> if (runstate == RUN_STATE_RUNNING) {
>> vm_start()
>> } else if (runstate == RUN_STATE_SUSPENDED)
>> vm_prepare_start(); // the start of vm_start()
>> }
>>
>> qemu_system_wakeup_request()
>> if (some condition)
>> resume_all_vcpus(); // the remainder of vm_start()
>> else
>> runstate_set(RUN_STATE_RUNNING)
>
> No it doesn't. wakeup_reason is set there, main loop does the resuming.
> See:
>
> if (qemu_wakeup_requested()) {
> pause_all_vcpus();
> qemu_system_wakeup();
> notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
> wakeup_reason = QEMU_WAKEUP_REASON_NONE;
> resume_all_vcpus();
> qapi_event_send_wakeup();
> }
Agreed, we can rely on that main loop code to call resume_all_vcpus, and not
modify qemu_system_wakeup_request. That is cleaner.
>> How is that better than my patches
>> [PATCH V3 01/10] vl: start on wakeup request
>> [PATCH V3 02/10] migration: preserve suspended runstate
>>
>> loadvm_postcopy_handle_run_bh()
>> runstate = global_state_get_runstate();
>> if (runstate == RUN_STATE_RUNNING)
>> vm_start()
>> else
>> runstate_set(runstate); // eg RUN_STATE_SUSPENDED
>>
>> qemu_system_wakeup_request()
>> if (!vm_started)
>> vm_start();
>> else
>> runstate_set(RUN_STATE_RUNNING);
>>
>> Recall this thread started with your comment "It then can avoid touching the
>> system wakeup code which seems cleaner". We still need to touch the wakeup
>> code.
>
> Let me provide some code and reply to your new patchset inlines.
Thank you, I have more comments there.
- Steve