qemu-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-commits] [qemu/qemu] 25c978: spapr: Reset CAS & IRQ subsystem afte


From: Peter Maydell
Subject: [Qemu-commits] [qemu/qemu] 25c978: spapr: Reset CAS & IRQ subsystem after devices
Date: Tue, 13 Aug 2019 04:45:15 -0700

  Branch: refs/heads/master
  Home:   https://github.com/qemu/qemu
  Commit: 25c9780d38d4494f8610371d883865cf40b35dd6
      
https://github.com/qemu/qemu/commit/25c9780d38d4494f8610371d883865cf40b35dd6
  Author: David Gibson <address@hidden>
  Date:   2019-08-13 (Tue, 13 Aug 2019)

  Changed paths:
    M hw/ppc/spapr.c

  Log Message:
  -----------
  spapr: Reset CAS & IRQ subsystem after devices

This fixes a nasty regression in qemu-4.1 for the 'pseries' machine,
caused by the new "dual" interrupt controller model.  Specifically,
qemu can crash when used with KVM if a 'system_reset' is requested
while there's active I/O in the guest.

The problem is that in spapr_machine_reset() we:

1. Reset the CAS vector state
        spapr_ovec_cleanup(spapr->ov5_cas);

2. Reset all devices
        qemu_devices_reset()

3. Reset the irq subsystem
        spapr_irq_reset();

However (1) implicitly changes the interrupt delivery mode, because
whether we're using XICS or XIVE depends on the CAS state.  We don't
properly initialize the new irq mode until (3) though - in particular
setting up the KVM devices.

During (2), we can temporarily drop the BQL allowing some irqs to be
delivered which will go to an irq system that's not properly set up.

Specifically, if the previous guest was in (KVM) XIVE mode, the CAS
reset will put us back in XICS mode.  kvm_kernel_irqchip() still
returns true, because XIVE was using KVM, however XICs doesn't have
its KVM components intialized and kernel_xics_fd == -1.  When the irq
is delivered it goes via ics_kvm_set_irq() which assert()s that
kernel_xics_fd != -1.

This change addresses the problem by delaying the CAS reset until
after the devices reset.  The device reset should quiesce all the
devices so we won't get irqs delivered while we mess around with the
IRQ.  The CAS reset and irq re-initialize should also now be under the
same BQL critical section so nothing else should be able to interrupt
it either.

We also move the spapr_irq_msi_reset() used in one of the legacy irq
modes, since it logically makes sense at the same point as the
spapr_irq_reset() (it's essentially an equivalent operation for older
machine types).  Since we don't need to switch between different
interrupt controllers for those old machine types it shouldn't
actually be broken in those cases though.

Cc: Cédric Le Goater <address@hidden>

Fixes: b2e22477 "spapr: add a 'reset' method to the sPAPR IRQ backend"
Fixes: 13db0cd9 "spapr: introduce a new sPAPR IRQ backend supporting
                 XIVE and XICS"
Signed-off-by: David Gibson <address@hidden>


  Commit: 310cda5b5e9df642b19a0e9c504368ffba3b3ab9
      
https://github.com/qemu/qemu/commit/310cda5b5e9df642b19a0e9c504368ffba3b3ab9
  Author: Cédric Le Goater <address@hidden>
  Date:   2019-08-13 (Tue, 13 Aug 2019)

  Changed paths:
    M hw/intc/spapr_xive_kvm.c
    M hw/intc/xive.c
    M include/hw/ppc/xive.h

  Log Message:
  -----------
  spapr/xive: Fix migration of hot-plugged CPUs

The migration sequence of a guest using the XIVE exploitation mode
relies on the fact that the states of all devices are restored before
the machine is. This is not true for hot-plug devices such as CPUs
which state come after the machine. This breaks migration because the
thread interrupt context registers are not correctly set.

Fix migration of hotplugged CPUs by restoring their context in the
'post_load' handler of the XiveTCTX model.

Fixes: 277dd3d7712a ("spapr/xive: add migration support for KVM")
Signed-off-by: Cédric Le Goater <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: David Gibson <address@hidden>


  Commit: 968ff692cf5096a9e0d33e7e4a7fff10edb63001
      
https://github.com/qemu/qemu/commit/968ff692cf5096a9e0d33e7e4a7fff10edb63001
  Author: Peter Maydell <address@hidden>
  Date:   2019-08-13 (Tue, 13 Aug 2019)

  Changed paths:
    M hw/intc/spapr_xive_kvm.c
    M hw/intc/xive.c
    M hw/ppc/spapr.c
    M include/hw/ppc/xive.h

  Log Message:
  -----------
  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190813' into 
staging

ppc patch queue 2019-08-13 (last minute qemu-4.1 fixes)

Here's a very, very last minute pull request for qemu-4.1.  This fixes
two nasty bugs with the XIVE interrupt controller in "dual" mode
(where the guest decides which interrupt controller it wants to use).
One occurs when resetting the guest while I/O is active, and the other
with migration of hotplugged CPUs.

The timing here is very unfortunate.  Alas, we only spotted these bugs
very late, and I was sick last week, delaying analysis and fix even
further.

This series hasn't had nearly as much testing as I'd really like, but
I'd still like to squeeze it into qemu-4.1 if possible, since
definitely fixing two bad bugs seems like an acceptable tradeoff for
the risk of introducing different bugs.

# gpg: Signature made Tue 13 Aug 2019 07:56:42 BST
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <address@hidden>" [full]
# gpg:                 aka "David Gibson (Red Hat) <address@hidden>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <address@hidden>" [full]
# gpg:                 aka "David Gibson (kernel.org) <address@hidden>" 
[unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-4.1-20190813:
  spapr/xive: Fix migration of hot-plugged CPUs
  spapr: Reset CAS & IRQ subsystem after devices

Signed-off-by: Peter Maydell <address@hidden>


Compare: https://github.com/qemu/qemu/compare/5e7bcdcfe69c...968ff692cf50



reply via email to

[Prev in Thread] Current Thread [Next in Thread]