qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: making a qdev bus available from a (non-qtree?) device


From: Klaus Jensen
Subject: Re: making a qdev bus available from a (non-qtree?) device
Date: Mon, 17 May 2021 08:44:00 +0200

On May 12 14:02, Markus Armbruster wrote:
Klaus Jensen <its@irrelevant.dk> writes:

Hi all,

I need some help with grok'ing qdev busses. Stefan, Michael - David
suggested on IRC that I CC'ed you guys since you might have solved a
similar issue with virtio devices. I've tried to study how that works,
but I'm not exactly sure how to apply it to the issue I'm having.

Currently, to support multiple namespaces on the emulated nvme device,
one can do something like this:

  -device nvme,id=nvme-ctrl-0,serial=foo,...
  -device nvme-ns,id=nvme-ns-0,bus=nvme-ctrl-0,...
  -device nvme-ns,id-nvme-ns-1,bus=nvme-ctrl-0,...

The nvme device creates an 'nvme-bus' and the nvme-ns devices has
dc->bus_type = TYPE_NVME_BUS. This all works very well and provides a
nice overview in `info qtree`:

  bus: main-system-bus
  type System
    ...
    dev: q35-pcihost, id ""
      ..
      bus: pcie.0
        type PCIE
        ..
        dev: nvme, id "nvme-ctrl-0"
          ..
          bus: nvme-ctrl-0
            type nvme-bus
            dev: nvme-ns, id "nvme-ns-0"
              ..
            dev: nvme-ns, id "nvme-ns-1"
              ..


Nice and qdevy.

We have since introduced support for NVM Subsystems through an
nvme-subsys device. The nvme-subsys device is just a TYPE_DEVICE and
does not show in `info qtree`

Yes.

Most devices plug into a bus.  DeviceClass member @bus_type specifies
the type of bus they plug into, and DeviceState member @parent_bus
points to the actual BusState.  Example: PCI devices plug into a PCI
bus, and have ->bus_type = TYPE_PCI_BUS.

Some devices don't.  @bus_type and @parent_bus are NULL then.

Most buses are provided by a device.  BusState member @parent points to
the device.

The main-system-bus isn't.  Its @parent is null.

"info qtree" only shows the qtree rooted at main-system-bus.  It doesn't
show qtrees rooted at bus-less devices or device-less buses other than
main-system-bus.  I doubt such buses exist.


Makes sense.

                              (I wonder if this should actually just
have been an -object?).

Does nvme-subsys expose virtual hardware to the guest?  Memory, IRQs,
...

If yes, it needs to be a device.

If no, object may be more appropriate.  Tell us more about what it does.


It does not expose any virtual hardware. See below.


                        Anyway. The nvme device has a 'subsys' link
parameter and we use this to manage the namespaces across the
subsystem that may contain several nvme devices (controllers). The
problem is that this doesnt work too well with unplugging since if the
nvme device is `device_del`'ed, the nvme-ns devices on the nvme-bus
are unrealized which is not what we want. We really want the
namespaces to linger, preferably on an nvme-bus of the nvme-subsys
device so they can be attached to other nvme devices that may show up
(or already exist) in the subsystem.

The core problem I'm having is that I can't seem to create an nvme-bus
from the nvme-subsys device and make it available to the nvme-ns
device on the command line:

  -device nvme-subsys,id=nvme-subsys-0,...
  -device nvme-ns,bus=nvme-subsys-0

The above results in 'No 'nvme-bus' bus found for device 'nvme-ns',
even though I do `qbus_create_inplace()` just like the nvme
device. However, I *can* reparent the nvme-ns device in its realize()
method, so if I instead define it like so:

  -device nvme-subsys,id=nvme-subsys-0,...
  -device nvme,id=nvme-ctrl-0,subsys=nvme-subsys-0
  -device nvme-ns,bus=nvme-ctrl-0

I can then call `qdev_set_parent_bus()` and set the parent bus to the
bus creates in the nvme-subsys device. This solves the problem since
the namespaces are not "garbage collected" when the nvme device is
removed, but it just feels wrong you know? Also, if possible, I'd of
course really like to retain the nice entries in `info qtree`.

I'm afraid I'm too ignorant on NVME to give useful advice.

Can you give us a brief primer on the aspects of physical NVME devices
you'd like to model in QEMU?  What are "controllers", "namespaces", and
"subsystems", and how do they work together?

Once we understand the relevant aspects of physical devices, we can
discuss how to best model them in QEMU.


An "NVM Subsystem" is basically just a term to talk about a collection of controllers and namespaces. A namespace is just a quantity of non-volatile memory that the controller can use to store stuff on.

Only the controller is a piece of virtual hardware. An example subsystem looks like this:


          +------------------+     +-----------------+
          |   controller A   |     |   controller B  |
          +------------------+     +-----------------+
          +--------++--------+     +--------++-------+
          | NSID 1 || NSID 2 |     | NSID 3 | NSID 2 |
          +--------++--------+     +--------++-------+
          +--------+    |          +--------+    |
          |  NS A  |    |          |  NS C  |    |
          +--------+    |          +--------+    |
                        |                        |
                        +------------------------+
                                     |
                                 +--------+
                                 |  NS B  |
                                 +--------+


This is the example in Figure 5 in the NVMe v1.4 specification. Here, we have two controllers (that we model with the 'nvme' pci-based device). Each controller has one "private" namespace (NS A and NS C) and shares one namespace (NS B). The namespace IDs are unique across the subsystem and are assigned by the controller when attached to a namespace.

We use the 'nvme-ns' device (TYPE_DEVICE) to model the namespaces, and I guess this should could also just have been an -object, not sure if we can change that now. The 'nvme-ns' device mostly exist to hold the block backend configuration and related namespace only parameters. Prior to the introduction of subsystem, while we could have multiple controllers on the PCI bus, they could not share namespaces. To support this we introduced the 'nvme-subsys' device to allow the namespaces to be shared. This support is considered experimental, so I think we can get away with changing this to be an object.

As I explained in my first mail, we attach namespaces to controllers through a bus. This means that even in the absence of an explicit "bus=..." parameter on the nvme-ns device, it will "connect" on the most recently defined "nvme-bus" (of the most recently defined controller). With subsystems we would also like to model "unattached" namespaces that exists solely in the subsystem (i.e. NOT attached to any controllers). That is why I was trying to get the nvme-ns devices to attach to a bus created by the "non-bus-attached" subsystem device. And that is what I can't do. We could add a link property to the nvme-ns device instead, but then the bus magic in qemu would still happen and the namespace would end up "attached" (in qemu terms) to a controller anyway - and it would complain if we defined the namespace device prior to defining any controller devices since no usable bus exist.

Thanks for helping out with this!

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]