[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65f3a842988d6_a9b4294f7@dwillia2-mobl3.amr.corp.intel.com.notmuch>
Date: Thu, 14 Mar 2024 18:45:38 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Li Ming <ming4.li@...el.com>, <dan.j.williams@...el.com>,
<rrichter@....com>, <terry.bowman@....com>
CC: <linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Li Ming
<ming4.li@...el.com>
Subject: Re: [RFC PATCH 0/6] Add support for root port RAS error handling
Li Ming wrote:
> Protocol errors signaled to a CXL root port may be captured by a Root
> Complex Event Collector(RCEC). If those errors are not cleared and
> reported the system owner loses forensic information for system failure
> analysis.
>
> Per CXL r3.1 section 9.18.1.5, the recommendation for this case from CXL
> specification is the 'Else' statement in 'IMPLEMENTATION NODE' under
> 'Table 9-24 RDPAS Structure':
>
> "Probe all CXL Downstream Ports and determine whether they have logged an
> error in the CXL.io or CXL.cachemem status registers."
>
> The CXL subsystem already supports RCH RAS Error handling that has a
> dependency on the RCEC. Reuse and extend that RCH topoogy support to
> handle reported errors in the VH topology case. The implementation is
> composed of:
> * Provide a new interface from RCEC side to support walk all devices
> under RCEC and RCEC associated bus range. PCIe AER core uses this
> interface to walk all CXL endpoints and all CXL root ports under the
> bus ranges.
> * Update the PCIe AER core to enable Uncorrectable Internal Errors and
> Correctable Internal Errors report for root ports.
Thanks for the above background.
> * Invoke the cxl_pci error handler for RCEC reported errors.
So what do you expect happens when a switch is involved? In the RCH case
it knows that the only thing that can fire RCEC is a root complex
integrated endpoint implementation driven by cxl_pci. In the VH case it
could be a switch.
> * Handle root-port errors in the cxl_pci handler when the device is
> direct attached.
I do expect direct-attach to be a predominant use case, but I want to
make sure that the implementation at least does not make the switch port
error handling case more difficult to implement.
Powered by blists - more mailing lists