qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 0/7] FWNMI fixes / changes


From: Nicholas Piggin
Subject: [PATCH 0/7] FWNMI fixes / changes
Date: Thu, 5 Mar 2020 14:59:31 +1000

Hi,

Since RFC, I addressed review comments (thank you), I've also dropped
the MCE changes for now just to concentrate on the basics.

The major problem is PAPR requires system reset to also be delivered
to the address registered by nmi-register when FWNMI is enabled.

This series first allows FWNMI to work on TCG, because sresets can be
injected under TCG (unlike MCEs) so it becomes important. Then it adds
sreset to the migration state. Then it wire up sreset delivery to the
sreset fwnmi address.

One remaining issue we have now is that Linux and IBM's proprietary
hypervisor seem to deviate from PAPR when delivering fwnmi sreset
interrupts. PAPR says all registers should be set as at interrupt
time, as opposed to MCE delivery where r3 is specified to contain the
address of memory which contains the original r3 and an error record.

PowerVM delivers fwnmi sreset with this r3 scheme, and Linux tries to
restore r3 from that address. Linux also calls "ibm,nmi-interlock" after
loading r3 from that address, whereas PAPR does not require it. PowerVM
seems to ignore the call and doesn't synchronize anything.

I've asked PowerVM developers about this and waiting to see what comes
of that before going further here. This is still broken on Linux at
worst we can corrupt r3 with a random value from the rtas blob, at
best we get:

[   43.148121] FWNMI: corrupt r3 0x0000000000000000
[   43.148231] FWNMI: nmi-interlock failed: -3
[   43.148286] FWNMI: corrupt r3 0x0000000000000000
[   43.148296] FWNMI: nmi-interlock failed: -3

when handling sresets. However that problem is not introduced by this
patch, because the Linux kernel fwnmi and 0x100 sreset entry points are
identical, and this bug happens inside a common handler in an
'if (fwnmi_active)' branch.

We can probably provide this compatible behaviour without being totally
broken with respect to sreset on multiple CPUs concurrently or nesting
inside/outside of a machine check handler, if we provide a small
per-cpu area for the r3 save. That would require an RTAS/SLOF change to
allocate more space, so I haven't implemented that yet.

Thanks,
Nick


Nicholas Piggin (7):
  ppc/spapr: Fix FWNMI machine check failure handling
  ppc/spapr: Change FWNMI names
  ppc/spapr: Add FWNMI System Reset state
  ppc/spapr: Fix FWNMI machine check interrupt delivery
  ppc/spapr: Allow FWNMI on TCG
  target/ppc: allow ppc_cpu_do_system_reset to take an alternate vector
  ppc/spapr: Implement FWNMI System Reset delivery

 hw/ppc/spapr.c                    | 35 ++++++++------
 hw/ppc/spapr_caps.c               | 19 ++++----
 hw/ppc/spapr_events.c             | 36 ++++----------
 hw/ppc/spapr_rtas.c               | 29 ++++++++----
 include/hw/ppc/spapr.h            | 28 +++++++----
 target/ppc/cpu.h                  |  3 +-
 target/ppc/excp_helper.c          | 79 ++++++++++++++++++++++---------
 tests/qtest/libqos/libqos-spapr.h |  2 +-
 8 files changed, 135 insertions(+), 96 deletions(-)

-- 
2.23.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]