qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V7 19/29] vfio-pci: cpr part 1 (fd and dma)


From: Steven Sistare
Subject: Re: [PATCH V7 19/29] vfio-pci: cpr part 1 (fd and dma)
Date: Fri, 11 Mar 2022 11:22:01 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2

On 3/10/2022 5:30 PM, Alex Williamson wrote:
> On Thu, 10 Mar 2022 14:55:50 -0500
> Steven Sistare <steven.sistare@oracle.com> wrote:
> 
>> On 3/10/2022 1:35 PM, Alex Williamson wrote:
>>> On Thu, 10 Mar 2022 10:00:29 -0500
>>> Steven Sistare <steven.sistare@oracle.com> wrote:
>>>   
>>>> On 3/7/2022 5:16 PM, Alex Williamson wrote:  
>>>>> On Wed, 22 Dec 2021 11:05:24 -0800
>>>>> Steve Sistare <steven.sistare@oracle.com> wrote:  
>>>>> [...]
>>>>>> diff --git a/migration/cpr.c b/migration/cpr.c
>>>>>> index 37eca66..cee82cf 100644
>>>>>> --- a/migration/cpr.c
>>>>>> +++ b/migration/cpr.c
>>>>>> @@ -7,6 +7,7 @@
>>>>>>  
>>>>>>  #include "qemu/osdep.h"
>>>>>>  #include "exec/memory.h"
>>>>>> +#include "hw/vfio/vfio-common.h"
>>>>>>  #include "io/channel-buffer.h"
>>>>>>  #include "io/channel-file.h"
>>>>>>  #include "migration.h"
>>>>>> @@ -101,7 +102,9 @@ void qmp_cpr_exec(strList *args, Error **errp)
>>>>>>          error_setg(errp, "cpr-exec requires cpr-save with restart 
>>>>>> mode");
>>>>>>          return;
>>>>>>      }
>>>>>> -
>>>>>> +    if (cpr_vfio_save(errp)) {
>>>>>> +        return;
>>>>>> +    }    
>>>>>
>>>>> Why is vfio so unique that it needs separate handlers versus other
>>>>> devices?  Thanks,    
>>>>
>>>> In earlier patches these functions fiddled with more objects, but at this 
>>>> point
>>>> they are simple enough to convert to pre_save and post_load vmstate 
>>>> handlers for
>>>> the container and group objects.  However, we would still need to call 
>>>> special 
>>>> functons for vfio from qmp_cpr_exec:
>>>>
>>>>   * validate all containers support CPR before we start blasting vaddrs
>>>>     However, I could validate all, in every call to pre_save for each 
>>>> container.
>>>>     That would be less efficient, but fits the vmstate model.  
>>>
>>> Would it be a better option to mirror the migration blocker support, ie.
>>> any device that doesn't support cpr registers a blocker and generic
>>> code only needs to keep track of whether any blockers are registered.  
>>
>> We cannot specifically use migrate_add_blocker(), because it is checked in
>> the migration specific function migrate_prepare(), in a layer of functions 
>> above the simpler qemu_save_device_state() used in cpr.  But yes, we could
>> do something similar for vfio.  Increment a global counter in vfio_realize
>> if the container does not support cpr, and decrement it when the container is
>> destroyed.  pre_save could just check the counter.
> 
> Right, not suggesting to piggyback on migrate_add_blocker() only to use
> a similar mechanism.  Only drivers that can't support cpr need register
> a blocker but testing for blockers is done generically, not just for
> vfio devices.
> 
>>>>   * restore all vaddr's if qemu_save_device_state fails.
>>>>     However, I could recover for all containers inside pre_save when one 
>>>> container fails.
>>>>     Feels strange touching all objects in a function for one, but there is 
>>>> no real
>>>>     downside.  
>>>
>>> I'm not as familiar as I should be with migration callbacks, thanks to
>>> mostly not supporting it for vfio devices, but it seems strange to me
>>> that there's no existing callback or notifier per device to propagate
>>> save failure.  Do we not at least get some sort of resume callback in
>>> that case?  
>>
>> We do not:
>>     struct VMStateDescription {
>>         int (*pre_load)(void *opaque);
>>         int (*post_load)(void *opaque, int version_id);
>>         int (*pre_save)(void *opaque);
>>         int (*post_save)(void *opaque);
>>
>> The handler returns an error, which stops further saves and is propagated 
>> back
>> to the top level caller qemu_save_device_state().
>>
>> The vast majority of handlers do not have side effects, with no need to 
>> unwind 
>> anything on failure.
>>
>> This raises another point.  If pre_save succeeds for all the containers,
>> but fails for some non-vfio object, then the overall operation is abandoned,
>> but we do not restore the vaddr's.  To plug that hole, we need to call the
>> unwind code from qmp_cpr_save, or implement your alternative below.
> 
> We're trying to reuse migration interfaces, are we also triggering
> migration state change notifiers?  ie.
> MIGRATION_STATUS_{CANCELLING,CANCELLED,FAILED}  

No. That happens in the migration layer which we do not use.

> We already hook vfio
> devices supporting migration into that notifier to tell the driver to
> move the device back to the running state on failure, which seems a bit
> unique to vfio devices.  Containers could maybe register their own
> callbacks.
> 
>>> As an alternative, maybe each container could register a vm change
>>> handler that would trigger reloading vaddrs if we move to a running
>>> state and a flag on the container indicates vaddrs were invalidated?
>>> Thanks,  
>>
>> That works and is modular, but I dislike that it adds checks on the
>> happy path for a case that will rarely happen, and it pushes recovery from
>> failure further away from the original failure, which would make debugging
>> cascading failures more difficult.
> 
> Would using the migration notifier move us sufficiently closer to the
> failure point?  Otherwise I think you're talking about unwinding all
> the containers when any one fails, where you didn't like that object
> overreach, or maybe adding an optional callback... but I wonder if the
> above notifier essentially already does that.
> 
> In any case, I think we have options to either implement new or use
> existing notifier-like functionality to avoid all these vfio specific
> callouts.  Thanks,

Yes, defining a cpr notifier for failure and cleanup is a good solution.
I'll work on that and a cpr blocker.  I'll use the latter for vfio and
the chardevs.

- Steve



reply via email to

[Prev in Thread] Current Thread [Next in Thread]