qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/6] pci: introduce apci-index property for PCI device


From: Michael S. Tsirkin
Subject: Re: [PATCH 2/6] pci: introduce apci-index property for PCI device
Date: Fri, 14 May 2021 09:37:32 -0400

On Mon, Mar 15, 2021 at 02:00:58PM -0400, Igor Mammedov wrote:
> In x86/ACPI world, linux distros are using predictable
> network interface naming since systemd v197. Which on
> QEMU based VMs results into path based naming scheme,
> that names network interfaces based on PCI topology.
> 
> With itm on has to plug NIC in exactly the same bus/slot,
> which was used when disk image was first provisioned/configured
> or one risks to loose network configuration due to NIC being
> renamed to actually used topology.
> That also restricts freedom to reshape PCI configuration of
> VM without need to reconfigure used guest image.
> 
> systemd also offers "onboard" naming scheme which is
> preferred over PCI slot/topology one, provided that
> firmware implements:
>     "
>     PCI Firmware Specification 3.1
>     4.6.7.  DSM for Naming a PCI or PCI Express Device Under
>             Operating Systems
>     "
> that allows to assign user defined index to PCI device,
> which systemd will use to name NIC. For example, using
>   -device e1000,acpi-index=100
> guest will rename NIC to 'eno100', where 'eno' is default
> prefix for "onboard" naming scheme. This doesn't require
> any advance configuration on guest side to com in effect
> at 'onboard' scheme takes priority over path based naming.
> 
> Hope is that 'acpi-index' it will be easier to consume by
> management layer, compared to forcing specific PCI topology
> and/or having several disk image templates for different
> topologies and will help to simplify process of spawning
> VM from the same template without need to reconfigure
> guest NIC.
> 
> This patch adds, 'acpi-index'* property and wires up
> a 32bit register on top of pci hotplug register block
> to pass index value to AML code at runtime.
> Following patch will add corresponding _DSM code and
> wire it up to PCI devices described in ACPI.
> 
> *) name comes from linux kernel terminology
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

So something to note here is that it's more or less linux centric even
though it's from the spec:

Assignment of specific device names to multi-function devices installed in 
expansion
slots, and/or PCI or PCI Express devices that are hot-added to expansion slots 
in operating system-
environment would be handled in operating system-specific manner, and is not 
specified via this
specification.

So I think commit log should note that point about hot add:
that even though spec excludes it, it seems to work
because the Linux guest re-evaluate the method each time
device is added.


Also, should not we exclude multi-function devices with this property?
Or at least document that it won't work ...




> ---
> v1:
>  *  fix build failure of mips target
>  *  move out handling acpi-index into separate MMIO register,
>     instead of abusing unused PCI hotplug registers (mst)
>  *  default acpi-index property value to 0
>  *  set and check for max acpi-index value (systemd limitation)
> ---
>  include/hw/acpi/pcihp.h |  9 +++++--
>  include/hw/pci/pci.h    |  1 +
>  hw/acpi/pci.c           |  1 -
>  hw/acpi/pcihp.c         | 58 +++++++++++++++++++++++++++++++++++++++--
>  hw/acpi/piix4.c         |  3 ++-
>  hw/acpi/trace-events    |  2 ++
>  hw/i386/acpi-build.c    | 13 ++++++++-
>  hw/pci/pci.c            |  1 +
>  8 files changed, 81 insertions(+), 7 deletions(-)
> 
> diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h
> index dfd375820f..2dd90aea30 100644
> --- a/include/hw/acpi/pcihp.h
> +++ b/include/hw/acpi/pcihp.h
> @@ -46,6 +46,7 @@ typedef struct AcpiPciHpPciStatus {
>  typedef struct AcpiPciHpState {
>      AcpiPciHpPciStatus acpi_pcihp_pci_status[ACPI_PCIHP_MAX_HOTPLUG_BUS];
>      uint32_t hotplug_select;
> +    uint32_t acpi_index;
>      PCIBus *root;
>      MemoryRegion io;
>      bool legacy_piix;
> @@ -71,13 +72,17 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool 
> acpihp_root_off);
>  
>  extern const VMStateDescription vmstate_acpi_pcihp_pci_status;
>  
> -#define VMSTATE_PCI_HOTPLUG(pcihp, state, test_pcihp) \
> +bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id);
> +
> +#define VMSTATE_PCI_HOTPLUG(pcihp, state, test_pcihp, test_acpi_index) \
>          VMSTATE_UINT32_TEST(pcihp.hotplug_select, state, \
>                              test_pcihp), \
>          VMSTATE_STRUCT_ARRAY_TEST(pcihp.acpi_pcihp_pci_status, state, \
>                                    ACPI_PCIHP_MAX_HOTPLUG_BUS, \
>                                    test_pcihp, 1, \
>                                    vmstate_acpi_pcihp_pci_status, \
> -                                  AcpiPciHpPciStatus)
> +                                  AcpiPciHpPciStatus), \
> +        VMSTATE_UINT32_TEST(pcihp.acpi_index, state, \
> +                            test_acpi_index)
>  
>  #endif
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 1bc231480f..6be4e0c460 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -359,6 +359,7 @@ struct PCIDevice {
>  
>      /* ID of standby device in net_failover pair */
>      char *failover_pair_id;
> +    uint32_t acpi_index;
>  };
>  
>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
> diff --git a/hw/acpi/pci.c b/hw/acpi/pci.c
> index ec455c3b25..75b1103ec4 100644
> --- a/hw/acpi/pci.c
> +++ b/hw/acpi/pci.c
> @@ -59,4 +59,3 @@ void build_mcfg(GArray *table_data, BIOSLinker *linker, 
> AcpiMcfgInfo *info,
>      build_header(linker, table_data, (void *)(table_data->data + mcfg_start),
>                   "MCFG", table_data->len - mcfg_start, 1, oem_id, 
> oem_table_id);
>  }
> -
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 9dc4d3e2db..ceab287bd3 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -39,12 +39,13 @@
>  #include "trace.h"
>  
>  #define ACPI_PCIHP_ADDR 0xae00
> -#define ACPI_PCIHP_SIZE 0x0014
> +#define ACPI_PCIHP_SIZE 0x0018
>  #define PCI_UP_BASE 0x0000
>  #define PCI_DOWN_BASE 0x0004
>  #define PCI_EJ_BASE 0x0008
>  #define PCI_RMV_BASE 0x000c
>  #define PCI_SEL_BASE 0x0010
> +#define PCI_AIDX_BASE 0x0014
>  
>  typedef struct AcpiPciHpFind {
>      int bsel;
> @@ -251,9 +252,13 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool 
> acpihp_root_off)
>      acpi_pcihp_update(s);
>  }
>  
> +#define ONBOARD_INDEX_MAX (16 * 1024 - 1)
> +
>  void acpi_pcihp_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>                                     DeviceState *dev, Error **errp)
>  {
> +    PCIDevice *pdev = PCI_DEVICE(dev);
> +
>      /* Only hotplugged devices need the hotplug capability. */
>      if (dev->hotplugged &&
>          acpi_pcihp_get_bsel(pci_get_bus(PCI_DEVICE(dev))) < 0) {
> @@ -261,6 +266,17 @@ void acpi_pcihp_device_pre_plug_cb(HotplugHandler 
> *hotplug_dev,
>                     ACPI_PCIHP_PROP_BSEL "' set");
>          return;
>      }
> +
> +    /*
> +     * capped by systemd (see: udev-builtin-net_id.c)
> +     * as it's the only known user honor it to avoid users
> +     * misconfigure QEMU and then wonder why acpi-index doesn't work
> +     */
> +    if (pdev->acpi_index > ONBOARD_INDEX_MAX) {
> +        error_setg(errp, "acpi-index should be less or equal to %u",
> +                   ONBOARD_INDEX_MAX);
> +        return;
> +    }
>  }
>  
>  void acpi_pcihp_device_plug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState 
> *s,
> @@ -347,7 +363,6 @@ static uint64_t pci_read(void *opaque, hwaddr addr, 
> unsigned int size)
>          trace_acpi_pci_down_read(val);
>          break;
>      case PCI_EJ_BASE:
> -        /* No feature defined yet */
>          trace_acpi_pci_features_read(val);
>          break;
>      case PCI_RMV_BASE:
> @@ -357,6 +372,12 @@ static uint64_t pci_read(void *opaque, hwaddr addr, 
> unsigned int size)
>      case PCI_SEL_BASE:
>          val = s->hotplug_select;
>          trace_acpi_pci_sel_read(val);
> +        break;
> +    case PCI_AIDX_BASE:
> +        val = s->acpi_index;
> +        s->acpi_index = 0;
> +        trace_acpi_pci_acpi_index_read(val);
> +        break;
>      default:
>          break;
>      }
> @@ -367,8 +388,35 @@ static uint64_t pci_read(void *opaque, hwaddr addr, 
> unsigned int size)
>  static void pci_write(void *opaque, hwaddr addr, uint64_t data,
>                        unsigned int size)
>  {
> +    int slot;
> +    PCIBus *bus;
> +    BusChild *kid, *next;
>      AcpiPciHpState *s = opaque;
> +
> +    s->acpi_index = 0;
>      switch (addr) {
> +    case PCI_AIDX_BASE:
> +        /*
> +         * fetch acpi-index for specified slot so that follow up read from
> +         * PCI_AIDX_BASE can return it to guest
> +         */
> +        slot = ctz32(data);
> +
> +        if (s->hotplug_select >= ACPI_PCIHP_MAX_HOTPLUG_BUS) {
> +            break;
> +        }
> +
> +        bus = acpi_pcihp_find_hotplug_bus(s, s->hotplug_select);
> +        QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
> +            Object *o = OBJECT(kid->child);
> +            PCIDevice *dev = PCI_DEVICE(o);
> +            if (PCI_SLOT(dev->devfn) == slot) {
> +                s->acpi_index = object_property_get_uint(o, "acpi-index", 
> NULL);
> +                break;
> +            }
> +        }
> +        trace_acpi_pci_acpi_index_write(s->hotplug_select, slot, 
> s->acpi_index);
> +        break;
>      case PCI_EJ_BASE:
>          if (s->hotplug_select >= ACPI_PCIHP_MAX_HOTPLUG_BUS) {
>              break;
> @@ -413,6 +461,12 @@ void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, 
> PCIBus *root_bus,
>                                     OBJ_PROP_FLAG_READ);
>  }
>  
> +bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id)
> +{
> +     AcpiPciHpState *s = opaque;
> +     return s->acpi_index;
> +}
> +
>  const VMStateDescription vmstate_acpi_pcihp_pci_status = {
>      .name = "acpi_pcihp_pci_status",
>      .version_id = 1,
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 1efc0ded9f..6056d51667 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -297,7 +297,8 @@ static const VMStateDescription vmstate_acpi = {
>              2, vmstate_pci_status,
>              struct AcpiPciHpPciStatus),
>          VMSTATE_PCI_HOTPLUG(acpi_pci_hotplug, PIIX4PMState,
> -                            vmstate_test_use_acpi_hotplug_bridge),
> +                            vmstate_test_use_acpi_hotplug_bridge,
> +                            vmstate_acpi_pcihp_use_acpi_index),
>          VMSTATE_END_OF_LIST()
>      },
>      .subsections = (const VMStateDescription*[]) {
> diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
> index f91ced477d..dcc1438f3a 100644
> --- a/hw/acpi/trace-events
> +++ b/hw/acpi/trace-events
> @@ -41,6 +41,8 @@ acpi_pci_unplug_request(int bsel, int slot) "bsel: %d slot: 
> %d"
>  acpi_pci_up_read(uint32_t val) "%" PRIu32
>  acpi_pci_down_read(uint32_t val) "%" PRIu32
>  acpi_pci_features_read(uint32_t val) "%" PRIu32
> +acpi_pci_acpi_index_read(uint32_t val) "%" PRIu32
> +acpi_pci_acpi_index_write(unsigned bsel, unsigned slot, uint32_t aidx) 
> "bsel: %u slot: %u aidx: %" PRIu32
>  acpi_pci_rmv_read(uint32_t val) "%" PRIu32
>  acpi_pci_sel_read(uint32_t val) "%" PRIu32
>  acpi_pci_ej_write(uint64_t addr, uint64_t data) "0x%" PRIx64 " <== %" PRIu64
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 442b4629a9..e49fae2bfd 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1168,9 +1168,10 @@ static void build_piix4_pci_hotplug(Aml *table)
>      aml_append(scope, field);
>  
>      aml_append(scope,
> -        aml_operation_region("BNMR", AML_SYSTEM_IO, aml_int(0xae10), 0x04));
> +        aml_operation_region("BNMR", AML_SYSTEM_IO, aml_int(0xae10), 0x08));
>      field = aml_field("BNMR", AML_DWORD_ACC, AML_NOLOCK, AML_WRITE_AS_ZEROS);
>      aml_append(field, aml_named_field("BNUM", 32));
> +    aml_append(field, aml_named_field("PIDX", 32));
>      aml_append(scope, field);
>  
>      aml_append(scope, aml_mutex("BLCK", 0));
> @@ -1184,6 +1185,16 @@ static void build_piix4_pci_hotplug(Aml *table)
>      aml_append(method, aml_return(aml_int(0)));
>      aml_append(scope, method);
>  
> +    method = aml_method("AIDX", 2, AML_NOTSERIALIZED);
> +    aml_append(method, aml_acquire(aml_name("BLCK"), 0xFFFF));
> +    aml_append(method, aml_store(aml_arg(0), aml_name("BNUM")));
> +    aml_append(method,
> +        aml_store(aml_shiftleft(aml_int(1), aml_arg(1)), aml_name("PIDX")));
> +    aml_append(method, aml_store(aml_name("PIDX"), aml_local(0)));
> +    aml_append(method, aml_release(aml_name("BLCK")));
> +    aml_append(method, aml_return(aml_local(0)));
> +    aml_append(scope, method);
> +
>      aml_append(table, scope);
>  }
>  
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 0eadcdbc9e..ac9a24889c 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -79,6 +79,7 @@ static Property pci_props[] = {
>                      QEMU_PCIE_EXTCAP_INIT_BITNR, true),
>      DEFINE_PROP_STRING("failover_pair_id", PCIDevice,
>                         failover_pair_id),
> +    DEFINE_PROP_UINT32("acpi-index",  PCIDevice, acpi_index, 0),
>      DEFINE_PROP_END_OF_LIST()
>  };
>  
> -- 
> 2.27.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]