qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug


From: Klaus Jensen
Subject: Re: [PATCH] hw/block/nvme: re-enable NVMe PCI hotplug
Date: Tue, 11 May 2021 15:37:30 +0200

On May 11 15:12, Hannes Reinecke wrote:
On 5/11/21 2:22 PM, Klaus Jensen wrote:
On May 11 09:35, Hannes Reinecke wrote:
Ever since commit e570768566 ("hw/block/nvme: support for shared
namespace in subsystem") NVMe PCI hotplug is broken, as the PCI
hotplug infrastructure will only work for the nvme devices (which
are PCI devices), but not for any attached namespaces.
So when re-adding the NVMe PCI device via 'device_add' the NVMe
controller is added, but all namespaces are missing.
This patch adds device hotplug hooks for NVMe namespaces, such that one
can call 'device_add nvme-ns' to (re-)attach the namespaces after
the PCI NVMe device 'device_add nvme' hotplug call.


Hi Hannes,

Thanks for this.

The real fix here is that namespaces are properly detached from other
controllers that it may be shared on.

But is this really the behavior we want? That nvme-ns devices always
"belongs to" (in QEMU qdev terms) an nvme device is an artifact of the
Bus/Device architecture and not really how an NVM subsystem should
behave. Removing a controller should not cause shared namespaces to
disappear from other controllers.

I have a WIP that instead adds an NvmeBus to the nvme-subsys device and
reparents the nvme-ns devices to that if the parent controller is linked
to a sybsystem. This way, nvme-ns devices wont be unrealized under the
feet of other controllers.

That would be the other direction I thought of; _technically_ NVMe
namespaces are objects of the subsystem, and 'controllers' are just
temporary objects providing access to the namespaces presented by the
subsystem.
So if you are going to rework it I'd rather make the namespaces children
objects of the subsystem, and have nsid maps per controller detailing
which nsids are accessible from the individual controllers.
That would probably a simple memcpy() to start with, but it would allow
us to modify that map via NVMe-MI and stuff.

However, if you do that you'll find that subsystems can't be hotplugged,
too; but I'm sure you'll be able to fix it up :-)

The hotplug fix looks good - I'll post a series that tries to integrate
both.

Ta.

The more I think about it, the more I think we should be looking into
reparenting the namespaces to the subsystem.
That would have the _immediate_ benefit that 'device_del' and
'device_add' becomes symmetric (ie one doesn't have to do a separate
'device_add nvme-ns'), as the nvme namespace is not affected by the
hotplug event.


I have that working, but I'm struggling with a QEMU API technicality in that I apparently cannot simply move the NvmeBus creation to the nvme-subsys device. For some reason the bus is not available for the nvme-ns devices. That is, if one does something like this:

  -device nvme-subsys,...
  -device nvme-ns,...

Then I get an error that "no 'nvme-bus' bus found for device 'nvme'ns". This is probably just me not grok'ing the qdev well enough, so I'll keep trying to fix that. What works now is to have the regular setup:

  -device nvme-subsys,...
  -device nvme,...
  -device nvme-ns,...

And the nvme-ns device will then reparent to the NvmeBus on nvme-subsys (which magically now IS available when nvme-ns is realized). This has the same end result, but I really would like that the namespaces could be specified as children of the subsys directly.

Any help from qdev experts would be appreciated! :)

This really was a quick hack to demonstrate a shortcoming in the linux
NVMe stack (cf 'nvme-mpath: delete disk after last connection' if you
are interested in details), so I'm sure there is room for improvement.


I follow linux-nvme, so I saw the patch ;)

And the prime reason for sending it out was to gauge interest by
qemu-devel; I have a somewhat mixed experience when sending patches to
the qemu ML ...


Your contribution is very much appreciated :)

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]