qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [RFC PATCH 5/5] cxl/core: add poison injection event handler


From: Dan Williams
Subject: RE: [RFC PATCH 5/5] cxl/core: add poison injection event handler
Date: Fri, 9 Feb 2024 22:54:48 -0800

Shiyang Ruan wrote:
> Currently driver only trace cxl events, poison injection on cxl memdev
> is silent.  OS needs to be notified then it could handle poison range
> in time.  Per CXL spec, the device error event could be signaled through
> FW-First and OS-First methods.
> 
> So, add poison event handler in OS-First method:
>   - qemu:
>     - CXL device report POISON event to OS by MSI by sending GMER after
>       injecting a poison record

QEMU details do not belong in a kernel changelog. It is ok for an RFC,
but my hope is that this can be tested on hardware after being proven on
QEMU.

>   - CXL driver
>     a. read the POISON event through GMER;   <-- this patch
>     b. get POISON list;
>     c. translate DPA to HPA;
>     d. construct a mce instance, then call mce_log() to queue this mce
>        instance;

It is not clear to me why the kernel should proactively fire machine
check notifications on injection? The changelog needs to make clear why
the kernel should do this, and the consequences of not going it.

For CPU consumed poison the machine check event will already fire. For
background discovery of poison, that should translate to a
memory_failure() notification with teh MF_ACTION_REQUIRED flag cleared.
Userspace, like rasdaemon, can then make a page offline decision.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]