[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <654027e6562a8_3f60294d1@dwillia2-mobl3.amr.corp.intel.com.notmuch>
Date: Mon, 30 Oct 2023 15:02:14 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>,
Ira Weiny <ira.weiny@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
"Jonathan Cameron" <jonathan.cameron@...wei.com>
CC: Yazen Ghannam <yazen.ghannam@....com>,
Davidlohr Bueso <dave@...olabs.net>,
Dave Jiang <dave.jiang@...el.com>,
Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
"Ard Biesheuvel" <ardb@...nel.org>, <linux-efi@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-cxl@...r.kernel.org>
Subject: Re: [PATCH RFC v2 3/3] cxl/memdev: Register for and process CPER
events
Smita Koralahalli wrote:
[..]
> > +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> > +int cxl_cper_event_call(struct notifier_block *nb, unsigned long action, void *data)
> > +{
> > + struct cxl_cper_notifier_data *nd = data;
> > + struct cxl_event_record_raw record = (struct cxl_event_record_raw) {
> > + .hdr.id = UUID_INIT(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
> > + };
> > + enum cxl_event_log_type log_type;
> > + struct cxl_memdev_state *mds;
> > + u32 hdr_flags;
> > +
> > + mds = container_of(nb, struct cxl_memdev_state, cxl_cper_nb);
> > +
> > + /* Need serial number for device identification */
> > + if (!(nd->rec->hdr.validation_bits & CPER_CXL_DEVICE_SN_VALID))
> > + return NOTIFY_DONE;
>
> For all the event records that I tested so far, this has never been
> true. That is CPER_CXL_DEVICE_SN_VALID is never set which might not log
> the records at all. Should we be bit more lenient here and include
> validating device_id (bdf) instead and check if cxlds exist?
Agree. While I do think those devices are out of spec given CXL mandates
a valid serial number, I think the robustness priciple applies and Linux
should rely on bdf information. I also expect that with MH-SLDs and
potentially other scenarios, a serial number may be duplicated so bdf is
more reliable in that dimension as well.
> > + /* FIXME endianess and bytes of serial number need verification */
> > + /* FIXME Should other values be checked? */
> > + if (memcmp(&mds->cxlds.serial, &nd->rec->hdr.dev_serial_num,
> > + sizeof(mds->cxlds.serial)))
> > + return NOTIFY_DONE;
> > +
> > + /* ensure record can always handle the full CPER provided data */
> > + BUILD_BUG_ON(sizeof(record) <
> > + (CPER_CXL_COMP_EVENT_LOG_SIZE + sizeof(record.hdr.id)));
> > +
> > + /*
> > + * UEFI v2.10 defines N.2.14 defines the CXL CPER record as not
> > + * including the uuid field.
> > + */
> > + memcpy(&record.hdr.length, &nd->rec->comp_event_log,
> > + CPER_CXL_REC_LEN(nd->rec));
>
> I'm doubtful this will do the job. I think we should copy into each
> field of struct cxl_event_record_hdr individually starting from length
> by pointer arithmetic (which is definitely bad, but I cannot think of a
> better way to do this) and then do memcpy for data field in struct
> cxl_event_record_raw..
>
> Any other suggestions would be helpful as well.
>
> I can make these changes and validate it on my end if that works..?
It sounds like you have a more readily available real world test
environment for this, so that sounds good to me.
Powered by blists - more mailing lists