linux-kernel - Re: [PATCH v12 23/25] CXL/PCI: Introduce CXL uncorrectable protocol error recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d3d3ab84-8cdd-4386-82dd-de8149159985@intel.com>
Date: Mon, 29 Sep 2025 17:26:57 -0700
From: Dave Jiang <dave.jiang@...el.com>
To: Terry Bowman <terry.bowman@....com>, dave@...olabs.net,
 jonathan.cameron@...wei.com, alison.schofield@...el.com,
 dan.j.williams@...el.com, bhelgaas@...gle.com, shiju.jose@...wei.com,
 ming.li@...omail.com, Smita.KoralahalliChannabasappa@....com,
 rrichter@....com, dan.carpenter@...aro.org,
 PradeepVineshReddy.Kodamati@....com, lukas@...ner.de,
 Benjamin.Cheatham@....com, sathyanarayanan.kuppuswamy@...ux.intel.com,
 linux-cxl@...r.kernel.org, alucerop@....com, ira.weiny@...el.com
Cc: linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: [PATCH v12 23/25] CXL/PCI: Introduce CXL uncorrectable protocol
 error recovery



On 9/25/25 3:34 PM, Terry Bowman wrote:
> Populate the cxl_do_recovery() function with uncorrectable protocol error (UCE)
> handling. Follow similar design as found in PCIe error driver,
> pcie_do_recovery(). One difference is cxl_do_recovery() will treat all UCEs
> as fatal with a kernel panic. This is to prevent corruption on CXL memory.
> 
> Introduce cxl_walk_port(). Make this analogous to pci_walk_bridge() but walking
> CXL ports instead. This will iterate through the CXL topology from the
> erroring device through the downstream CXL Ports and Endpoints.
> 
> Export pci_aer_clear_fatal_status() for CXL to use if a UCE is not found.
> 
> Signed-off-by: Terry Bowman <terry.bowman@....com>
> 
> ---
> 
> Changes in v11->v12:
> - Cleaned up port discovery in cxl_do_recovery() (Dave)
> - Added PCI_EXP_TYPE_RC_END to type check in cxl_report_error_detected()
> 
> Changes in v10->v11:
> - pci_ers_merge_results() - Move to earlier patch
> ---
>  drivers/cxl/core/ras.c | 111 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 111 insertions(+)
> 
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 7e8d63c32d72..45f92defca64 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -443,8 +443,119 @@ void cxl_endpoint_port_init_ras(struct cxl_port *ep)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_endpoint_port_init_ras, "CXL");
>  
> +static int cxl_report_error_detected(struct device *dev, void *data)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	pci_ers_result_t vote, *result = data;
> +
> +	guard(device)(dev);
> +
> +	if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ENDPOINT) ||
> +	    (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END)) {
> +		if (!cxl_pci_drv_bound(pdev))
> +			return 0;
> +
> +		vote = cxl_error_detected(dev);
> +	} else {
> +		vote = cxl_port_error_detected(dev);
> +	}
> +
> +	*result = pci_ers_merge_result(*result, vote);
> +
> +	return 0;
> +}
> +
> +static int match_port_by_parent_dport(struct device *dev, const void *dport_dev)
> +{
> +	struct cxl_port *port;
> +
> +	if (!is_cxl_port(dev))
> +		return 0;
> +
> +	port = to_cxl_port(dev);
> +
> +	return port->parent_dport->dport_dev == dport_dev;
> +}
> +
> +static void cxl_walk_port(struct device *port_dev,
> +			  int (*cb)(struct device *, void *),
> +			  void *userdata)
> +{
> +	struct cxl_dport *dport = NULL;
> +	struct cxl_port *port;
> +	unsigned long index;
> +
> +	if (!port_dev)
> +		return;
> +
> +	port = to_cxl_port(port_dev);
> +	if (port->uport_dev && dev_is_pci(port->uport_dev))
> +		cb(port->uport_dev, userdata);

Could use some comments on what is being walked. Also an explanation of what is happening here would be good.

If this is an endpoint port, this would be the PCI endpoint device.
If it's a switch port, then this is the upstream port.
If it's a root port, this is skipped.

> +
> +	xa_for_each(&port->dports, index, dport)
> +	{
> +		struct device *child_port_dev __free(put_device) =
> +			bus_find_device(&cxl_bus_type, &port->dev, dport->dport_dev,
> +					match_port_by_parent_dport);
> +
> +		cb(dport->dport_dev, userdata);

This is going through all the downstream ports
> +
> +		cxl_walk_port(child_port_dev, cxl_report_error_detected, userdata);
> +	}
> +
> +	if (is_cxl_endpoint(port))
> +		cb(port->uport_dev->parent, userdata);

And this is the downstream parent port of the endpoint device

Why not move this before the xa_for_each() and return early? endpoint ports don't have dports, no need to even try to run that block above.

So in the current implementation,
1. Endpoint. It checks the device, and then it checks the downstream parent port for errors. Is checking the parent dport necessary?
2. Switch. It checks the upstream port, then it checks all the downstream ports for errors.
3. Root port. It checks all the downstream ports for errors.
Is this the correct understanding of what this function does?

> +}
> +
>  static void cxl_do_recovery(struct device *dev)
>  {
> +	pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct cxl_port *port = NULL;
> +
> +	if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) ||
> +	    (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM)) {
> +		struct cxl_dport *dport;
> +		struct cxl_port *rp_port __free(put_cxl_port) = find_cxl_port(&pdev->dev, &dport);
> +
> +		port = rp_port;
> +
> +	} else	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_UPSTREAM) {
> +		struct cxl_port *us_port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
> +
> +		port = us_port;
> +
> +	} else	if ((pci_pcie_type(pdev) == PCI_EXP_TYPE_ENDPOINT) ||
> +		    (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END)) {
> +		struct cxl_dev_state *cxlds;
> +
> +		if (!cxl_pci_drv_bound(pdev))
> +			return;

Need to have the pci dev lock before checking driver bound.

DJ
> +
> +		cxlds = pci_get_drvdata(pdev);
> +		port = cxlds->cxlmd->endpoint;
> +	}
> +
> +	if (!port) {
> +		dev_err(dev, "Failed to find the CXL device\n");
> +		return;
> +	}
> +
> +	cxl_walk_port(&port->dev, cxl_report_error_detected, &status);
> +	if (status == PCI_ERS_RESULT_PANIC)
> +		panic("CXL cachemem error.");
> +
> +	/*
> +	 * If we have native control of AER, clear error status in the device
> +	 * that detected the error.  If the platform retained control of AER,
> +	 * it is responsible for clearing this status.  In that case, the
> +	 * signaling device may not even be visible to the OS.
> +	 */
> +	if (cxl_error_is_native(pdev)) {
> +		pcie_clear_device_status(pdev);
> +		pci_aer_clear_nonfatal_status(pdev);
> +		pci_aer_clear_fatal_status(pdev);
> +	}
>  }
>  
>  static void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)