qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] spapr.c: always pulse guest IRQ in spapr_core_unplug_req


From: David Gibson
Subject: Re: [PATCH 2/2] spapr.c: always pulse guest IRQ in spapr_core_unplug_request()
Date: Tue, 20 Apr 2021 11:24:51 +1000

On Mon, Apr 12, 2021 at 04:27:43PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 3/31/21 11:37 PM, David Gibson wrote:
> > On Wed, Mar 31, 2021 at 09:04:37PM -0300, Daniel Henrique Barboza wrote:
> > > Commit 47c8c915b162 fixed a problem where multiple spapr_drc_detach()
> > > requests were breaking QEMU. The solution was to just spapr_drc_detach()
> > > once, and use spapr_drc_unplug_requested() to filter whether we already
> > > detached it or not. The commit also tied the hotplug request to the
> > > guest in the same condition.
> > > 
> > > Turns out that there is a reliable way for a CPU hotunplug to fail. If a
> > > guest with one CPU hotplugs a CPU1, then offline CPU0s via 'echo 0 >
> > > /sys/devices/system/cpu/cpu0/online', then attempts to hotunplug CPU1,
> > > the kernel will refuse it because it's the last online CPU of the
> > > system. Given that we're pulsing the IRQ only in the first try, in a
> > > failed attempt, all other CPU1 hotunplug attempts will fail, regardless
> > > of the online state of CPU1 in the kernel, because we're simply not
> > > letting the guest know that we want to hotunplug the device.
> > > 
> > > Let's move spapr_hotplug_req_remove_by_index() back out of the "if
> > > (!spapr_drc_unplug_requested(drc))" conditional, allowing for multiple
> > > 'device_del' requests to the same CPU core to reach the guest, in case
> > > the CPU core didn't fully hotunplugged previously.
> > > 
> > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> > 
> > I've applied these to ppc-for-6.0, but..
> > 
> > > ---
> > >   hw/ppc/spapr.c | 11 ++++++++++-
> > >   1 file changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 05a765fab4..e4be00b732 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -3777,8 +3777,17 @@ void spapr_core_unplug_request(HotplugHandler 
> > > *hotplug_dev, DeviceState *dev,
> > >       if (!spapr_drc_unplug_requested(drc)) {
> > >           spapr_drc_unplug_request(drc);
> > > -        spapr_hotplug_req_remove_by_index(drc);
> > >       }
> > > +
> > > +    /*
> > > +     * spapr_hotplug_req_remove_by_index is left unguarded, out of the
> > > +     * "!spapr_drc_unplug_requested" check, to allow for multiple IRQ
> > > +     * pulses removing the same CPU. Otherwise, in an failed hotunplug
> > > +     * attempt (e.g. the kernel will refuse to remove the last online
> > > +     * CPU), we will never attempt it again because unplug_requested
> > > +     * will still be 'true' in that case.
> > > +     */
> > > +    spapr_hotplug_req_remove_by_index(drc);
> > 
> > I think we need similar changes for all the other unplug types (LMB,
> > PCI, PHB) - basically retries should always be allowed, and at worst
> > be a no-op, rather than generating an error like they do now.
> 
> 
> For PHBs should be straightforward. Not so sure about PCI because there is
> all the PCI function logic around the hotunplug of function 0.
> 
> As for LMBs, we block further attempts because there is no way we can tell
> if the hotunplug is being executed but it is taking some time (it is not
> uncommon for a DIMM unplug to take 20-30 seconds to complete), versus
> an error scenario.

I don't see why that prevents retries.  Can't you reissue the
index+size unplug request anyway?  The code you already have to fail
unplugs on a reconfigure should work for both the original request and
the retry, shouldn't it?


> What we do ATM is check is the pending DIMM unplug
> state exists, and if it does, assume that a hotunplug is pending. I have
> no idea what would happen if an unplug request for a LMB DRC reaches the
> kernel in the middle of an error rollback (when the kernel reconnects all
> the LMBs again) and the same DRC that was rolled back is disconnected
> again.
> 
> We would need to check not only if the pending dimm unplug state exists, but
> also if partially exists. In other words, if there are DRCs of that DIMM that
> were unplugged already. That way we can prevent to issue a removal while
> the unplug is still running.
> 
> 
> Thanks,
> 
> 
> DHB
> 
> > 
> > >   }
> > >   int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > 
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]