[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230601150658.000021d4@Huawei.com>
Date: Thu, 1 Jun 2023 15:06:58 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Terry Bowman <terry.bowman@....com>
CC: <alison.schofield@...el.com>, <vishal.l.verma@...el.com>,
<ira.weiny@...el.com>, <bwidawsk@...nel.org>,
<dan.j.williams@...el.com>, <dave.jiang@...el.com>,
<linux-cxl@...r.kernel.org>, <rrichter@....com>,
<linux-kernel@...r.kernel.org>, <bhelgaas@...gle.com>,
Oliver O'Halloran <oohall@...il.com>,
<linuxppc-dev@...ts.ozlabs.org>, <linux-pci@...r.kernel.org>
Subject: Re: [PATCH v4 22/23] PCI/AER: Forward RCH downstream port-detected
errors to the CXL.mem dev handler
On Tue, 23 May 2023 18:22:13 -0500
Terry Bowman <terry.bowman@....com> wrote:
> From: Robert Richter <rrichter@....com>
>
> In Restricted CXL Device (RCD) mode a CXL device is exposed as an
> RCiEP, but CXL downstream and upstream ports are not enumerated and
> not visible in the PCIe hierarchy. Protocol and link errors are sent
> to an RCEC.
>
> Restricted CXL host (RCH) downstream port-detected errors are signaled
> as internal AER errors, either Uncorrectable Internal Error (UIE) or
> Corrected Internal Errors (CIE). The error source is the id of the
> RCEC. A CXL handler must then inspect the error status in various CXL
> registers residing in the dport's component register space (CXL RAS
> capability) or the dport's RCRB (PCIe AER extended capability). [1]
>
> Errors showing up in the RCEC's error handler must be handled and
> connected to the CXL subsystem. Implement this by forwarding the error
> to all CXL devices below the RCEC. Since the entire CXL device is
> controlled only using PCIe Configuration Space of device 0, function
> 0, only pass it there [2]. The error handling is limited to currently
> supported devices with the Memory Device class code set
> (PCI_CLASS_MEMORY_CXL, 502h), where the handler can be implemented in
> the existing cxl_pci driver. Support of CXL devices (e.g. a CXL.cache
> device) can be enabled later.
>
> In addition to errors directed to the CXL endpoint device, a handler
> must also inspect the CXL RAS and PCIe AER capabilities of the CXL
> downstream port that is connected to the device.
>
> Since CXL downstream port errors are signaled using internal errors,
> the handler requires those errors to be unmasked. This is subject of a
> follow-on patch.
>
> The reason for choosing this implementation is that a CXL RCEC device
> is bound to the AER port driver, but the driver does not allow it to
> register a custom specific handler to support CXL. Connecting the RCEC
> hard-wired with a CXL handler does not work, as the CXL subsystem
> might not be present all the time. The alternative to add an
> implementation to the portdrv to allow the registration of a custom
> RCEC error handler isn't worth doing it as CXL would be its only user.
> Instead, just check for an CXL RCEC and pass it down to the connected
> CXL device's error handler. With this approach the code can entirely
> be implemented in the PCIe AER driver and is independent of the CXL
> subsystem. The CXL driver only provides the handler.
>
> [1] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
> [2] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices
>
> Co-developed-by: Terry Bowman <terry.bowman@....com>
> Signed-off-by: Terry Bowman <terry.bowman@....com>
> Signed-off-by: Robert Richter <rrichter@....com>
> Cc: "Oliver O'Halloran" <oohall@...il.com>
> Cc: Bjorn Helgaas <bhelgaas@...gle.com>
> Cc: linuxppc-dev@...ts.ozlabs.org
> Cc: linux-pci@...r.kernel.org
> ---
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Powered by blists - more mailing lists