[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] pci: Abort if pci_add_capability fails
From: |
Alex Williamson |
Subject: |
Re: [PATCH] pci: Abort if pci_add_capability fails |
Date: |
Tue, 30 Aug 2022 12:00:14 -0600 |
On Tue, 30 Aug 2022 13:37:35 +0200
Markus Armbruster <armbru@redhat.com> wrote:
> if (!offset) {
> offset = pci_find_space(pdev, size);
> /* out of PCI config space is programming error */
> assert(offset);
> } else {
> /* Verify that capabilities don't overlap. Note: device assignment
> * depends on this check to verify that the device is not broken.
> * Should never trigger for emulated devices, but it's helpful
> * for debugging these. */
>
> The comment makes me suspect that device assignment of a broken device
> could trigger the error. It goes back to
>
> commit c9abe111209abca1b910e35c6ca9888aced5f183
> Author: Jan Kiszka <jan.kiszka@siemens.com>
> Date: Wed Aug 24 14:29:30 2011 +0200
>
> pci: Error on PCI capability collisions
>
> Nothing good can happen when we overlap capabilities. This may happen
> when plugging in assigned devices or when devices models contain bugs.
> Detect the overlap and report it.
>
> Based on qemu-kvm commit by Alex Williamson.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> Acked-by: Don Dutile <ddutile@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> If this is still correct, then your patch is a regression: QEMU is no
> longer able to gracefully handle assignment of a broken device. Does
> this matter? Alex, maybe?
Ok, that was a long time ago. I have some vague memories of hitting
something like this with a Broadcom NIC, but a google search for the
error string doesn't turn up anything recently. So there's a fair
chance this wouldn't break anyone initially.
Even back when the above patch was proposed, there were some
suggestions to turn the error path into an abort, which I pushed back
on since clearly enumerating capabilities of a device can occur due to
a hot-plug and we don't necessarily have control of the device being
added. This is only more true with the possibility of soft-devices out
of tree, through things like vfio-user.
Personally I think the right approach is to support an error path such
that we can abort when triggered by a cold-plug device, while simply
rejecting a broken hot-plug device, but that seems to be the minority
opinion among QEMU developers afaict. Thanks,
Alex