qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cxl nvdimm Potential probe ordering issues.


From: Gregory Price
Subject: Re: cxl nvdimm Potential probe ordering issues.
Date: Thu, 19 Jan 2023 23:53:53 -0500

On Thu, Jan 19, 2023 at 03:04:49PM +0000, Jonathan Cameron wrote:
> Gregory, would you mind checking if
> cxl_nvb is NULL here...
> https://elixir.bootlin.com/linux/v6.2-rc4/source/drivers/cxl/pmem.c#L67
> (printk before it is used should work).
> 
> Might also be worth checking cxl_nvd and cxl_ds
> but my guess is cxl_nvb is our problem (it is when I deliberate change
> the load order).
> 
> Jonathan
> 

This is exactly the issue.  cxl_nvb is null, the rest appear fine.

Also, note, that weirdly the non-volatile bridge shows up when launching
this in volatile mode, but no stack trace appears.

¯\_(ツ)_/¯

After spending way too much time tracing through the current cxl driver
code, i have only really determined that

1) The code is very pmem oriented, and it's unclear to me how the driver
   as-is differentiates a persistent device from a volatile device. That
         code path still completely escapes me.  The only differentiating code
         i see is in the memdev probe path that creates mem#/pmem and mem#/ram

2) The code successfully manages probe, enable, and mount a REAL device
   - cxl memdev appears (/sys/bus/cxl/devices/mem0)
         - a dax device appears (/sys/bus/dax/devices/)
           This happens at boot, which I assume must be bios related
         - The memory *does not* auto-online, instead the dax device can be
           onlined as system-ram *manually* via ndctl and friends

3) The code creates an nvdimm_bridge IFF a CFMW is defined - regardless
   of the type-3 device configuration (pmem-only or vmem-only)

   # CFMW defined
   [root@fedora ~]# ls /sys/bus/cxl/devices/
   decoder0.0  decoder2.0  mem0            port1
   decoder1.0  endpoint2   nvdimm-bridge0  root0

   # CFMW not defined
         [root@fedora ~]# ls /sys/bus/cxl/devices/
   decoder1.0  decoder2.0  endpoint2  mem0  port1  root0

4) As you can see above, multiple decoders are registered.  I'm not sure
   if that's correct or not, but it does seem odd given there's only one
         cxl type-3 device.  Odd that decoder0.0 shows up when CFMW is there,
         but not when it isn't.

         Note: All these tests have two root ports:
         -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
   -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
   -device cxl-rp,id=rp1,bus=cxl.0,chassis=0,port=1,slot=1 \


Don't know why I haven't thought of this until now, but is the CFMW code
reporting something odd about what's behind it?  Is it assuming the
devices are pmem?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]