qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V1 2/3] migration: fix suspended runstate


From: Steven Sistare
Subject: Re: [PATCH V1 2/3] migration: fix suspended runstate
Date: Thu, 24 Aug 2023 16:53:13 -0400
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0

On 8/17/2023 2:19 PM, Peter Xu wrote:
> On Wed, Aug 16, 2023 at 01:48:13PM -0400, Steven Sistare wrote:
>> On 8/14/2023 3:37 PM, Peter Xu wrote:
>>> On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
>>>>> Can we just call vm_state_notify() earlier?
>>>>
>>>> We cannot.  The guest is not running yet, and will not be until later.
>>>> We cannot call notifiers that perform actions that complete, or react to, 
>>>> the guest entering a running state.
>>>
>>> I tried to look at a few examples of the notifees and most of them I read
>>> do not react to "vcpu running" but "vm running" (in which case I think
>>> "suspended" mode falls into "vm running" case); most of them won't care on
>>> the RunState parameter passed in, but only the bool "running".
>>>
>>> In reality, when running=true, it must be RUNNING so far.
>>>
>>> In that case does it mean we should notify right after the switchover,
>>> since after migration the vm is indeed running only if the vcpus are not
>>> during suspend?
>>
>> I cannot parse your question, but maybe this answers it.
>> If the outgoing VM is running and not suspended, then the incoming side
>> tests for autostart==true and calls vm_start, which calls the notifiers,
>> right after the switchover.
> 
> I meant IMHO SUSPENDED should be seen as "vm running" case to me, just like
> RUNNING.  Then, we should invoke vm_prepare_start(), just need some touch
> ups.
> 
>>
>>> One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
>>> try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
>>> device; this kind of prove to me that SUSPEND is actually one of
>>> running=true states.
>>>
>>> If we postpone all notifiers here even after we switched over to dest qemu
>>> to the next upcoming suspend wakeup, I think it means these devices will
>>> not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
>>> VFIO_DEVICE_STATE_STOP.
>>
>> or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup.
>> AFAIK it is OK to remain in that state until wakeup is called later.
> 
> So let me provide another clue of why I think we should call
> vm_prepare_start()..
> 
> Firstly, I think RESUME event should always be there right after we
> switched over, no matter suspeneded or not.  I just noticed that your test
> case would work because you put "wakeup" to be before RESUME.  I'm not sure
> whether that's correct.  I'd bet people could rely on that RESUME to
> identify the switchover.

Yes, possibly.

> More importantly, I'm wondering whether RTC should still be running during
> the suspended mode?  Sorry again if my knowledge over there is just
> limited, so correct me otherwise - but my understanding is during suspend
> mode (s1 or s3, frankly I can't tell which one this belongs..), rtc should
> still be running along with the system clock.  It means we _should_ at
> least call cpu_enable_ticks() to enable rtc:

Agreed.  The comment above cpu_get_ticks says:
  * return the time elapsed in VM between vm_start and vm_stop
Suspending a guest does not call vm_stop, so ticks keeps running.
I also verified that experimentally.

> /*
>  * enable cpu_get_ticks()
>  * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
>  */
> void cpu_enable_ticks(void)
> 
> I think that'll enable cpu_get_tsc() and make it start to work right.
> 
>>
>>> Ideally I think we should here call vm_state_notify() with running=true and
>>> state=SUSPEND, but since I do see some hooks are not well prepared for
>>> SUSPEND over running=true, I'd think we should on the safe side call
>>> vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
>>> over phase.  With that IIUC it'll naturally work (e.g. when wakeup again
>>> later we just need to call no notifiers).
>>
>> Notifiers are just one piece, all the code in vm_prepare_start must be 
>> called.
>> Is it correct to call all of that long before we actually resume the CPUs in
>> wakeup?  I don't know, but what is the point?
> 
> The point is not only for cleaness (again, I really, really don't like that
> new global.. sorry), but also now I think we should make the vm running.

I do believe it is safe to call vm_prepare_start immediately, since the vcpus
are never running when it is called.

>> The wakeup code still needs
>> modification to conditionally resume the vcpus.  The scheme would be roughly:
>>
>>     loadvm_postcopy_handle_run_bh()
>>         runstat = global_state_get_runstate();
>>         if (runstate == RUN_STATE_RUNNING) {
>>             vm_start()
>>         } else if (runstate == RUN_STATE_SUSPENDED)
>>             vm_prepare_start();   // the start of vm_start()
>>         }
>>
>>     qemu_system_wakeup_request()
>>         if (some condition)
>>             resume_all_vcpus();   // the remainder of vm_start()
>>         else
>>             runstate_set(RUN_STATE_RUNNING)
> 
> No it doesn't.  wakeup_reason is set there, main loop does the resuming.
> See:
> 
>     if (qemu_wakeup_requested()) {
>         pause_all_vcpus();
>         qemu_system_wakeup();
>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>         resume_all_vcpus();
>         qapi_event_send_wakeup();
>     }

Agreed, we can rely on that main loop code to call resume_all_vcpus, and not
modify qemu_system_wakeup_request.  That is cleaner. 

>> How is that better than my patches
>>     [PATCH V3 01/10] vl: start on wakeup request
>>     [PATCH V3 02/10] migration: preserve suspended runstate
>>
>>     loadvm_postcopy_handle_run_bh()
>>         runstate = global_state_get_runstate();
>>         if (runstate == RUN_STATE_RUNNING)
>>             vm_start()
>>         else
>>             runstate_set(runstate);    // eg RUN_STATE_SUSPENDED
>>
>>     qemu_system_wakeup_request()
>>         if (!vm_started)
>>             vm_start();
>>         else
>>             runstate_set(RUN_STATE_RUNNING);
>>
>> Recall this thread started with your comment "It then can avoid touching the 
>> system wakeup code which seems cleaner".  We still need to touch the wakeup
>> code.
> 
> Let me provide some code and reply to your new patchset inlines.

Thank you, I have more comments there.

- Steve




reply via email to

[Prev in Thread] Current Thread [Next in Thread]