[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH for-6.2 v6 6/7] spapr: use DEVICE_UNPLUG_ERROR to report unpl
From: |
Markus Armbruster |
Subject: |
Re: [PATCH for-6.2 v6 6/7] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors |
Date: |
Mon, 23 Aug 2021 15:33:19 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> On 8/7/21 11:06 AM, Markus Armbruster wrote:
>> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
>>
>>> Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal
>>> error path, signalling that the hotunplug process wasn't successful.
>>> This allow us to send a DEVICE_UNPLUG_ERROR in drc_unisolate_logical()
>>> to signal this error to the management layer.
>>>
>>> We also have another error path in spapr_memory_unplug_rollback() for
>>> configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs
>>> in the hotunplug error path, but it will reconfigure them. Let's send
>>> the DEVICE_UNPLUG_ERROR event in that code path as well to cover the
>>> case of older kernels.
>>>
>>> Reviewed-by: Greg Kurz <groug@kaod.org>
>>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>>> ---
>>> hw/ppc/spapr.c | 9 ++++++++-
>>> hw/ppc/spapr_drc.c | 18 ++++++++++++------
>>> 2 files changed, 20 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 1611d7ab05..5459f9a7e9 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -29,6 +29,7 @@
>>> #include "qemu/datadir.h"
>>> #include "qapi/error.h"
>>> #include "qapi/qapi-events-machine.h"
>>> +#include "qapi/qapi-events-qdev.h"
>>> #include "qapi/visitor.h"
>>> #include "sysemu/sysemu.h"
>>> #include "sysemu/hostmem.h"
>>> @@ -3686,13 +3687,19 @@ void spapr_memory_unplug_rollback(SpaprMachineState
>>> *spapr, DeviceState *dev)
>>> /*
>>> * Tell QAPI that something happened and the memory
>>> - * hotunplug wasn't successful.
>>> + * hotunplug wasn't successful. Keep sending
>>> + * MEM_UNPLUG_ERROR even while sending DEVICE_UNPLUG_ERROR
>>> + * until the deprecation MEM_UNPLUG_ERROR is due.
>>> */
>>> if (dev->id) {
>>> qapi_error = g_strdup_printf("Memory hotunplug rejected by the
>>> guest "
>>> "for device %s", dev->id);
>>> qapi_event_send_mem_unplug_error(dev->id, qapi_error);
>>> }
>>> +
>>> + qapi_event_send_device_unplug_error(!!dev->id, dev->id,
>>> + dev->canonical_path,
>>> + qapi_error != NULL, qapi_error);
>>> }
>>>
>> When dev->id is null, we send something like
>>
>> {"event": "DEVICE_UNPLUG_ERROR",
>> "data": {"path": "/machine/..."},
>> "timestamp": ...}
>>
>> Unless I'm missing something, this is all the information the management
>> application really needs.
>>
>> When dev->id is non-null, we add to "data":
>>
>> "device": "dev123",
>> "msg": "Memory hotunplug rejected by the guest for device
>> dev123",
>>
>> I'm fine with emitting the device ID when we have it.
>>
>> What's the intended use of "msg"?
>>
>> Could DEVICE_UNPLUG_ERROR ever be emitted for this device with a
>> different "msg"?
>
>
> It won't have a different 'msg' for the current use of the event in both ppc64
> and x86. It'll always be the same '<dev> hotunplug rejected by the guest'
> message.
>
> The idea is that a future caller might want to insert a more informative
> message, such as "hotunplug failed: memory is being used by kernel space"
> or any other more specific condition. But then I guess we can argue that,
> if that time comes, one can just add this new optional 'msg' member in this
> event, and for now we can live without it.
>
> Would you oppose to renaming this new event to "DEVICE_UNPLUG_GUEST_ERROR"
> and then remove the 'msg' member? I guess this rename would make it clearer
> for management that we're reporting a guest side error, making any further
> clarifications via 'msg' unneeded.
No objection.