[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67aeade4ee5bc_2d1e2942b@dwillia2-xfh.jf.intel.com.notmuch>
Date: Thu, 13 Feb 2025 18:43:49 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Terry Bowman <terry.bowman@....com>, <linux-cxl@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>,
<nifan.cxl@...il.com>, <dave@...olabs.net>, <jonathan.cameron@...wei.com>,
<dave.jiang@...el.com>, <alison.schofield@...el.com>,
<vishal.l.verma@...el.com>, <dan.j.williams@...el.com>,
<bhelgaas@...gle.com>, <mahesh@...ux.ibm.com>, <ira.weiny@...el.com>,
<oohall@...il.com>, <Benjamin.Cheatham@....com>, <rrichter@....com>,
<nathan.fontenot@....com>, <Smita.KoralahalliChannabasappa@....com>,
<lukas@...ner.de>, <ming.li@...omail.com>,
<PradeepVineshReddy.Kodamati@....com>
Subject: Re: [PATCH v7 17/17] cxl/pci: Handle CXL Endpoint and RCH Protocol
Errors separately from PCIe errors
Terry Bowman wrote:
> CXL Endpoint and Restricted CXL Host (RCH) Downstream Port Protocol Errors
> are currently treated as PCIe errors, which does not properly process CXL
> uncorrectable (UCE) errors. When a CXL device encounters an uncorrectable
> Protocol Error, the system should panic to prevent potential CXL memory
> corruption.
>
> Treat CXL Endpoint Protocol Errors as CXL errors. This requires updates in
> the CXL and AER drivers.
>
> Update the CXL Endpoint driver with a new declaration for struct
> cxl_error_handlers named cxl_ep_error_handlers. Move the existing CE and
> UCE handler assignments from cxl_error_handlers to the new
> cxl_ep_error_handlers. Remove the 'state' parameter from the UCE handler
> interface because it is not used in CXL recovery.
>
> Update the AER driver to associate CXL Protocol errors with CXL error
> handling. Change detection in handles_cxl_errors() from using
> pcie_is_cxl_port() to instead use pcie_is_cxl().
This all looks ok for what it is, but given the prior discussion about
cxl_error_handlers only running in the CXL domain I think this will
result in the cxl_pci driver having even less to do.
The cxl_core will default register port error handlers that can panic on
notification. The cxl_pci driver's only job is then responding to PCI
events and registering CXL objects to let the core handle.
Powered by blists - more mailing lists