qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 5/5] cxl/core: add poison injection event handler


From: Shiyang Ruan
Subject: Re: [RFC PATCH 5/5] cxl/core: add poison injection event handler
Date: Fri, 15 Mar 2024 10:29:07 +0800
User-agent: Mozilla Thunderbird



在 2024/2/14 0:51, Jonathan Cameron 写道:

+
+void cxl_event_handle_record(struct cxl_memdev *cxlmd,
+                            enum cxl_event_log_type type,
+                            enum cxl_event_type event_type,
+                            const uuid_t *uuid, union cxl_event *evt)
+{
+       if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
                trace_cxl_general_media(cxlmd, type, &evt->gen_media);
-       else if (event_type == CXL_CPER_EVENT_DRAM)
+               /* handle poison event */
+               if (type == CXL_EVENT_TYPE_FAIL)
+                       cxl_event_handle_poison(cxlmd, &evt->gen_media);

I'm not 100% convinced this is necessary poison causing.  Also
the text tells us we should see 'an appropriate event'.
DRAM one seems likely to be chosen by some vendors.

I think it's right to use DRAM Event Record for volatile-memdev, but should poison on a persistent-memdev also use DRAM Event Record too? Though its 'Physical Address' feild has the 'Volatile' bit too, which is same as General Media Event Record. I am a bit confused about this.


The fatal check maybe makes it a little more likely (maybe though
I'm not sure anything says a device must log it to the failure log)
but it might be Memory Event Type 1, which is the host tried to
access an invalid address.  Sure poison might be returned to that
error but what would the main kernel memory handling do with it?
Something is very wrong
but it's not corrupted device memory.  TE state violations are in there
as well. Sure poison is returned on reads (I think - haven't checked).

IF the aim here is to say 'maybe there is poison, better check the
poison list'. Then that is reasonable but we should ensure things
like timer expiry are definitely ruled out and rename the function
to make it clear it might not find poison.

I forgot to distinguish the 'Transaction Type' here. Host Inject Poison is 0x04h. And other types should also have their specific handle method.


--
Thanks,
Ruan.


Jonathan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]