linux-kernel - Re: [RESEND v13 23/25] CXL/PCI: Introduce CXL uncorrectable protocol error recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10115294-8be9-42af-a466-40a194cfa4e8@intel.com>
Date: Tue, 4 Nov 2025 16:43:47 -0700
From: Dave Jiang <dave.jiang@...el.com>
To: Jonathan Cameron <jonathan.cameron@...wei.com>,
 Terry Bowman <terry.bowman@....com>
Cc: dave@...olabs.net, alison.schofield@...el.com, dan.j.williams@...el.com,
 bhelgaas@...gle.com, shiju.jose@...wei.com, ming.li@...omail.com,
 Smita.KoralahalliChannabasappa@....com, rrichter@....com,
 dan.carpenter@...aro.org, PradeepVineshReddy.Kodamati@....com,
 lukas@...ner.de, Benjamin.Cheatham@....com,
 sathyanarayanan.kuppuswamy@...ux.intel.com, linux-cxl@...r.kernel.org,
 alucerop@....com, ira.weiny@...el.com, linux-kernel@...r.kernel.org,
 linux-pci@...r.kernel.org
Subject: Re: [RESEND v13 23/25] CXL/PCI: Introduce CXL uncorrectable protocol
 error recovery



On 11/4/25 11:47 AM, Jonathan Cameron wrote:
> On Tue, 4 Nov 2025 11:03:03 -0600
> Terry Bowman <terry.bowman@....com> wrote:
> 
>> Implement cxl_do_recovery() to handle uncorrectable protocol
>> errors (UCE), following the design of pcie_do_recovery(). Unlike PCIe,
>> all CXL UCEs are treated as fatal and trigger a kernel panic to avoid
>> potential CXL memory corruption.
>>
>> Add cxl_walk_port(), analogous to pci_walk_bridge(), to traverse the
>> CXL topology from the error source through downstream CXL ports and
>> endpoints.
>>
>> Introduce cxl_report_error_detected(), mirroring PCI's
>> report_error_detected(), and implement device locking for the affected
>> subtree. Endpoints require locking the PCI device (pdev->dev) and the
>> CXL memdev (cxlmd->dev). CXL ports require locking the PCI
>> device (pdev->dev) and the parent CXL port.
>>
>> The device locks should be taken early where possible. The initially
>> reporting device will be locked after kfifo dequeue. Iterated devices
>> will be locked in cxl_report_error_detected() and must lock the
>> iterated devices except for the first device as it has already been
>> locked.
>>
>> Export pci_aer_clear_fatal_status() for use when a UCE is not present.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@....com>
> 
> Follow on comments around the locking stuff. If that has been there
> a while and I didn't notice before, sorry!
> 
>>
>> ---
>>
>> Changes in v12->v13:
>> - Add guard() before calling cxl_pci_drv_bound() (Dave Jiang)
>> - Add guard() calls for EP (cxlds->cxlmd->dev & pdev->dev) and ports
>>   (pdev->dev & parent cxl_port) in cxl_report_error_detected() and
>>   cxl_handle_proto_error() (Terry)
>> - Remove unnecessary check for endpoint port. (Dave Jiang)
>> - Remove check for RCIEP EP in cxl_report_error_detected(). (Terry)
>>
>> Changes in v11->v12:
>> - Clean up port discovery in cxl_do_recovery() (Dave)
>> - Add PCI_EXP_TYPE_RC_END to type check in cxl_report_error_detected()
>>
>> Changes in v10->v11:
>> - pci_ers_merge_results() - Move to earlier patch
>> ---
>>  drivers/cxl/core/ras.c | 135 ++++++++++++++++++++++++++++++++++++++++-
>>  drivers/pci/pci.h      |   1 -
>>  drivers/pci/pcie/aer.c |   1 +
>>  include/linux/aer.h    |   2 +
>>  4 files changed, 135 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index 5bc144cde0ee..52c6f19564b6 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
>> @@ -259,8 +259,138 @@ static void device_unlock_if(struct device *dev, bool take)
>>  		device_unlock(dev);
>>  }
>>  
>> +/**
>> + * cxl_report_error_detected
>> + * @dev: Device being reported
>> + * @data: Result
>> + * @err_pdev: Device with initial detected error. Is locked immediately
>> + *            after KFIFO dequeue.
>> + */
>> +static int cxl_report_error_detected(struct device *dev, void *data, struct pci_dev *err_pdev)
>> +{
>> +	bool need_lock = (dev != &err_pdev->dev);
> 
> Add a comment on why this controls need for locking.
> The resulting code is complex enough I'd be tempted to split the whole
> thing into locked and unlocked variants.

May not be a bad idea. Terry, can you see if this would reduce the complexity?

DJ 

> 
>> +	pci_ers_result_t vote, *result = data;
>> +	struct pci_dev *pdev;
>> +
>> +	if (!dev || !dev_is_pci(dev))
>> +		return 0;
>> +	pdev = to_pci_dev(dev);
>> +
>> +	device_lock_if(&pdev->dev, need_lock);
>> +	if (is_pcie_endpoint(pdev) && !cxl_pci_drv_bound(pdev)) {
>> +		device_unlock_if(&pdev->dev, need_lock);
>> +		return PCI_ERS_RESULT_NONE;
>> +	}
>> +
>> +	if (pdev->aer_cap)
>> +		pci_clear_and_set_config_dword(pdev,
>> +					       pdev->aer_cap + PCI_ERR_COR_STATUS,
>> +					       0, PCI_ERR_COR_INTERNAL);
>> +
>> +	if (is_pcie_endpoint(pdev)) {
>> +		struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
>> +
>> +		device_lock_if(&cxlds->cxlmd->dev, need_lock);
>> +		vote = cxl_error_detected(&cxlds->cxlmd->dev);
>> +		device_unlock_if(&cxlds->cxlmd->dev, need_lock);
>> +	} else {
>> +		vote = cxl_port_error_detected(dev);
>> +	}
>> +
>> +	pcie_clear_device_status(pdev);
>> +	*result = pcie_ers_merge_result(*result, vote);
>> +	device_unlock_if(&pdev->dev, need_lock);
>> +
>> +	return 0;
>> +}
> 
>> +
>> +/**
>> + * cxl_walk_port
> Needs a short description I think to count as valid kernel-doc and
> stop the tool moaning if anyone runs it on this.
> 
>> + *
>> + * @port: Port be traversed into
>> + * @cb: Callback for handling the CXL Ports
>> + * @userdata: Result
>> + * @err_pdev: Device with initial detected error. Is locked immediately
>> + *            after KFIFO dequeue.
>> + */
>> +static void cxl_walk_port(struct cxl_port *port,
>> +			  int (*cb)(struct device *, void *, struct pci_dev *),
>> +			  void *userdata,
>> +			  struct pci_dev *err_pdev)
>> +{
>> +	struct cxl_port *err_port __free(put_cxl_port) = get_cxl_port(err_pdev);
>> +	bool need_lock = (port != err_port);
>> +	struct cxl_dport *dport = NULL;
>> +	unsigned long index;
>> +
>> +	device_lock_if(&port->dev, need_lock);
>> +	if (is_cxl_endpoint(port)) {
>> +		cb(port->uport_dev->parent, userdata, err_pdev);
>> +		device_unlock_if(&port->dev, need_lock);
>> +		return;
>> +	}
>> +
>> +	if (port->uport_dev && dev_is_pci(port->uport_dev))
>> +		cb(port->uport_dev, userdata, err_pdev);
>> +
>> +	/*
>> +	 * Iterate over the set of Downstream Ports recorded in port->dports (XArray):
>> +	 *  - For each dport, attempt to find a child CXL Port whose parent dport
>> +	 *    match.
>> +	 *  - Invoke the provided callback on the dport's device.
>> +	 *  - If a matching child CXL Port device is found, recurse into that port to
>> +	 *    continue the walk.
>> +	 */
>> +	xa_for_each(&port->dports, index, dport)
>> +	{
> 
> Move that to line above for normal kernel loop formatting.
> 
> 	xa_for_each(&port->dports, index, dport) {
> 
>> +		struct device *child_port_dev __free(put_device) =
>> +			bus_find_device(&cxl_bus_type, &port->dev, dport->dport_dev,
>> +					match_port_by_parent_dport);
>> +
>> +		cb(dport->dport_dev, userdata, err_pdev);
>> +		if (child_port_dev)
>> +			cxl_walk_port(to_cxl_port(child_port_dev), cb, userdata, err_pdev);
>> +	}
>> +	device_unlock_if(&port->dev, need_lock);
>> +}
>> +
> 
>>  
>>  void cxl_handle_cor_ras(struct device *dev, u64 serial, void __iomem *ras_base)
>> @@ -483,16 +613,15 @@ static void cxl_proto_err_work_fn(struct work_struct *work)
>>  			if (!cxl_pci_drv_bound(pdev))
>>  				return;
>>  			cxlmd_dev = &cxlds->cxlmd->dev;
>> -			device_lock_if(cxlmd_dev, cxlmd_dev);
>>  		} else {
>>  			cxlmd_dev = NULL;
>>  		}
>>  
>> +		/* Lock the CXL parent Port */
>>  		struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
>> -		if (!port)
>> -			return;
>>  		guard(device)(&port->dev);
>>  
>> +		device_lock_if(cxlmd_dev, cxlmd_dev);
>>  		cxl_handle_proto_error(&wd);
>>  		device_unlock_if(cxlmd_dev, cxlmd_dev);
> Same issue on these helpers, but I'm also not sure why moving them in this
> patch makes sense. I'm not sure what changed.
> 
> Perhaps this is stuff that ended up in wrong patch?
>>  	}
> 
>