[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67abe412e153e_2d1e29469@dwillia2-xfh.jf.intel.com.notmuch>
Date: Tue, 11 Feb 2025 15:58:10 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: Terry Bowman <terry.bowman@....com>, <linux-cxl@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>,
<nifan.cxl@...il.com>, <dave@...olabs.net>, <jonathan.cameron@...wei.com>,
<dave.jiang@...el.com>, <alison.schofield@...el.com>,
<vishal.l.verma@...el.com>, <dan.j.williams@...el.com>,
<bhelgaas@...gle.com>, <mahesh@...ux.ibm.com>, <ira.weiny@...el.com>,
<oohall@...il.com>, <Benjamin.Cheatham@....com>, <rrichter@....com>,
<nathan.fontenot@....com>, <Smita.KoralahalliChannabasappa@....com>,
<lukas@...ner.de>, <ming.li@...omail.com>,
<PradeepVineshReddy.Kodamati@....com>
Subject: Re: [PATCH v7 05/17] PCI/AER: Add CXL PCIe Port correctable error
support in AER service driver
Terry Bowman wrote:
> The AER service driver supports handling Downstream Port Protocol Errors in
> Restricted CXL host (RCH) mode also known as CXL1.1. It needs the same
> functionality for CXL PCIe Ports operating in Virtual Hierarchy (VH)
> mode.[1]
>
> CXL and PCIe Protocol Error handling have different requirements that
> necessitate a separate handling path. The AER service driver may try to
> recover PCIe uncorrectable non-fatal errors (UCE). The same recovery is not
> suitable for CXL PCIe Port devices because of potential for system memory
> corruption. Instead, CXL Protocol Error handling must use a kernel panic
> in the case of a fatal or non-fatal UCE. The AER driver's PCIe Protocol
> Error handling does not panic the kernel in response to a UCE.
>
> Introduce a separate path for CXL Protocol Error handling in the AER
> service driver. This will allow CXL Protocol Errors to use CXL specific
> handling instead of PCIe handling. Add the CXL specific changes without
> affecting or adding functionality in the PCIe handling.
>
> Make this update alongside the existing Downstream Port RCH error handling
> logic, extending support to CXL PCIe Ports in VH mode.
>
> Remove is_internal_error(). is_internal_error() was used to determine if
> an AER error was a CXL error. Instead, now rely on pcie_is_cxl_port() to
> indicate the error is a CXL error.
Wait, pcie_is_cxl_port() in isolation is insufficient, right? In other
words, I would expect that when the response may escalate to panic()
that the code should be reasonably certain that this *is* a CXL error.
At a minimum that is:
pcie_is_cxl_port() && is_internal_error()
...or am I missing something that it makes it unlikely that a standard
PCIe error or other internal error type will not be thrown by a
pcie_is_cxl_port() device?
Powered by blists - more mailing lists