On 1/10/24 1:30 PM, Cédric Le Goater wrote:
On 9/12/23 13:41, Thomas Huth wrote:
From: Janosch Frank <frankja@linux.ibm.com>
Bound APQNs have to be reset before tearing down the secure config via
s390_machine_unprotect(). Otherwise the Ultravisor will return a error
code.
So let's do a subsystem_reset() which includes a AP reset before the
unprotect call. We'll do a full device_reset() afterwards which will
reset some devices twice. That's ok since we can't move the
device_reset() before the unprotect as it includes a CPU clear reset
which the Ultravisor does not expect at that point in time.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20230901114851.154357-1-frankja@linux.ibm.com>
Tested-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
hw/s390x/s390-virtio-ccw.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 3dd0b2372d..2d75f2131f 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -438,10 +438,20 @@ static void s390_machine_reset(MachineState *machine,
ShutdownCause reason)
switch (reset_type) {
case S390_RESET_EXTERNAL:
case S390_RESET_REIPL:
+ /*
+ * Reset the subsystem which includes a AP reset. If a PV
+ * guest had APQNs attached the AP reset is a prerequisite to
+ * unprotecting since the UV checks if all APQNs are reset.
+ */
+ subsystem_reset();
This commit introduced a regression with pass-though ISM devices.
After startup, a reboot will generate extra device resets (vfio-pci in
this case) which break the pass-though ISM device in a subtle way,
Hi Cedric, thanks for reporting this... I was able to reproduce just now, and
it looks like ISM firmware is unhappy specifically with this susbystem_reset
call added by ef1535901a0, not necessarily the multiple attempts at reset -- I
verified that reverting ef1535901a0 resolves the ISM issue, but if I instead
try reverting the older 03451953c79e while leaving ef1535901a0 in place then
ISM devices still break on guest reboot.
probably related to IOMMU mapping according to 03451953c79e
("s390x/pci: reset ISM passthrough devices on shutdown and system
reset"). After poweroff, the device is left in a sort-of-a-use state
on the host and the LPAR has to be rebooted to clear the invalid state
of the device. To be noted, that standard PCI devices are immune to
this change.
As a bit of background, ISM firmware is very sensitive re: the contents of the
(host) IOMMU and attempts at manipulation that it deems to be out-of-order; the
point of 03451953c79e was to ensure that the device gets a reset before we
attempt at unmapping anything that wasn't cleaned up in an orderly fashion by
the (guest) ism driver at the time of shutdown/reset (e.g. underlying firmware
may view guest SBAs in the IOMMU as still registered for use and will throw an
error condition at attempts to remove their entries in the IOMMU without first
going through an unregistration process).
The unmap that would make ISM upset would generally be coming out of
vfio_listener_region_del where we just do one big vfio_dma_unmap -- a quick
trace shows that the subsystem_reset call added by ef1535901a0 is causing the
vfio_listener_region_del to once again trigger before the pci reset of the ISM
device, effectively re-introducing the condition that 03451953c79e was trying
to resolve.
The extra resets should avoided in some ways, (a shutdown notifier and
a reset callback are already registered for ISM devices by 03451953c79e)
So as mentioned above, it's not the extra resets that are the issue, it's the
order of operations. Basically, we need to drive pci_device_reset for any ISM
device associated with the guest before we destroy the vfio memory listener
(now triggered in this case via subsystem_reset). So if we must drive this
subsystem_reset before we trigger the device reset callbacks then it might
require a s390 pci bus routine that is called before or during subystem_reset
just to reset the ISM devices associated with this guest first; I'm not sure
yet.
As an aside: I wonder why we are always doing the subsystem_reset here
unconditionally rather than only when s390_is_pv() since that seems to be the
only case that requires it.