[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ISSUE] memdev cannot be enabled after reboot due to failed dvsec ra
From: |
Zhijian Li (Fujitsu) |
Subject: |
Re: [ISSUE] memdev cannot be enabled after reboot due to failed dvsec range check [QEMU setup] |
Date: |
Wed, 15 Jan 2025 01:06:24 +0000 |
User-agent: |
Mozilla Thunderbird |
Cced QEMU,
Hi Fan,
I recalled we had a reboot issue[1] months ago
I guess your issue was caused by some registers not reset during reboot.
[1]
https://lore.kernel.org/linux-cxl/20240409075846.85370-1-lizhijian@fujitsu.com/
On 15/01/2025 04:30, Fan Ni wrote:
> Hi,
>
> Recently, while testing cxl with qemu setup, I found the memdev cannot
> be enabled successfully after reboot.
>
> Here is the setup and the steps I have tried.
>
> QEMU:
> https://gitlab.com/qemu-project/qemu.git
> branch: master
> commit: 8032c78e556cd0baec111740a6c636863f9bd7c8
>
> Kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/
> branch: next
> 2f84d072bdcb7d6ec66cc4d0de9f37a3dc394cd2
>
> Steps to reproduce the issue.
> 1. start the vm with cxl pmem device attached directly to RP.
> 2. Load the cxl drivers cxl_acpi cxl_core cxl_pci cxl_port cxl_mem, etc.
> Everyting works expected, the memory is corrected enabled and shown with
> cxl list.
> 3. Reboot the VM (run reboot command inside vm, no shutdown);
> 4. Load the cxl drivers as in step 2. the cxl pmem is not correctly enabled.
>
> dmesg shows some error as below:
> -------------------------------
> [ 17.131729] cxl_core:cxl_hdm_decode_init:443: cxl_pci 0000:0d:00.0: DVSEC
> Range0 denied by platform
> [ 17.135267] cxl_pci 0000:0d:00.0: Range register decodes outside platform
> defined CXL ranges.
> [ 17.138428] cxl_core:cxl_bus_probe:2073: cxl_port endpoint2: probe: -6
> [ 17.141104] cxl_core:devm_cxl_add_port:936: cxl_mem mem0: endpoint2 added
> to port1
> [ 17.143703] cxl_mem mem0: endpoint2 failed probe
> [ 17.145324] cxl_core:cxl_bus_probe:2073: cxl_mem mem0: probe: -6
> [ 17.171416] cxl_core:cxl_detach_ep:1499: cxl_mem mem0: disconnect mem0
> from port1
> ------------------------------
> Compare the step 2 and 4 with debug info. we can see,
> In step 2, when entry function: cxl_hdm_decode_init().
>
> (gdb) p *info
> $2 = {mem_enabled = false, ranges = 0, port = 0xffff8881097eac00, dvsec_range
> = {{start = 0, end = 0}, {start = 0, end = 0}}}
>
> The info struct is from cxl_dvsec_rr_decode(), where if mem_enabled is
> not enabled, it will return directly without reading dvsec range, so
> ranges == 0.
> This is what happened in step 2: no dvsec ranges are provided to the function
> for checking.
>
> When init the hdm decoder in cxl_hdm_decode_init function, the memory enable
> bit will be set.
>
> In step 4, after reboot, the enabled memory enable bit sustained and the
> dvsec range
> register will be read from the device in cxl_dvsec_rr_decode.
> So when entrying cxl_hdm_decode_init(),
> ------------------------------------
> $2 = {mem_enabled = true, ranges = 1, port = 0xffff888103c77400, dvsec_range
> = {{start = 0, end = 536870911}, {start = 0, end = 0}}}
> Breakpoint 2 at 0xffffffffc0657bbe: file drivers/cxl/core/pci.c, line 416.
> ------------------------------------
> It will cause the dvsec_range_allowed() failing as the range from dvsec range
> registers starts at address zero [0, 512], which does not match the hpa range
> stored in cxld->hpa_range, causing the issue.
>
> ------------------------------------
> Thread 1 hit Breakpoint 4, dvsec_range_allowed (dev=0xffff888108af9848,
> arg=0xffffc9000059f9b0) at drivers/cxl/core/pci.c:265
> 265 if (!(cxld->flags & CXL_DECODER_F_RAM))
> (gdb) b 268
> Breakpoint 5 at 0xffffffffc0657d31: file drivers/cxl/core/pci.c, line 271.
> (gdb) p /x cxld->hpa_range
> $5 = {start = 0xa90000000, end = 0xb8fffffff}
> (gdb) p /x *dev_range
> $7 = {start = 0x0, end = 0x1fffffff}
> (gdb)
> ------------------------------------
> The hpa_range is set when parsing the cfmws in __cxl_parse_cfmws.
>
> Any throughts?
>
> Open question: do we need to update the dvsec range register after we parse
> the
> cfmws to make the two above match.
- Re: [ISSUE] memdev cannot be enabled after reboot due to failed dvsec range check [QEMU setup],
Zhijian Li (Fujitsu) <=