lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6882d212cefff_134cc710099@dwillia2-xfh.jf.intel.com.notmuch>
Date: Thu, 24 Jul 2025 17:38:42 -0700
From: <dan.j.williams@...el.com>
To: Terry Bowman <terry.bowman@....com>, <dave@...olabs.net>,
	<jonathan.cameron@...wei.com>, <dave.jiang@...el.com>,
	<alison.schofield@...el.com>, <dan.j.williams@...el.com>,
	<bhelgaas@...gle.com>, <shiju.jose@...wei.com>, <ming.li@...omail.com>,
	<Smita.KoralahalliChannabasappa@....com>, <rrichter@....com>,
	<dan.carpenter@...aro.org>, <PradeepVineshReddy.Kodamati@....com>,
	<lukas@...ner.de>, <Benjamin.Cheatham@....com>,
	<sathyanarayanan.kuppuswamy@...ux.intel.com>, <terry.bowman@....com>,
	<linux-cxl@...r.kernel.org>
CC: <linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>
Subject: Re: [PATCH v10 06/17] PCI/AER: Dequeue forwarded CXL error

Terry Bowman wrote:
> The AER driver is now designed to forward CXL protocol errors to the CXL
> driver. Update the CXL driver with functionality to dequeue the forwarded
> CXL error from the kfifo. Also, update the CXL driver to begin the protocol
> error handling processing using the work received from the FIFO.
> 
> Introduce function cxl_proto_err_work_fn() to dequeue work forwarded by the
> AER service driver. This will begin the CXL protocol error processing with
> a call to cxl_handle_proto_error().
> 
> Update cxl/core/native_ras.c by adding cxl_rch_handle_error_iter() that was
> previously in the AER driver. Add check that Endpoint is bound to a CXL
> driver.
[..]
> +static void cxl_handle_proto_error(struct cxl_proto_error_info *err_info)
> +{
> +	struct pci_dev *pdev __free(pci_dev_put) =
> +		pci_get_domain_bus_and_slot(err_info->segment,
> +					    err_info->bus,
> +					    err_info->devfn);

So this patch in its current form is about restoring the RCH error
handling code which we already talked about should probably stay as a
special case in drivers/pci/pcie/.

For v11, where this code can 100% focus on VH error handling, my
expectation is to not see any PCI topology walking, i.e. no
pci_get_domain_bus_and_slot() no pci_walk_bridge() etc. If all we cared
about were PCI details this code could have remained in the PCI core.

Instead, my expectation is that motive for a kfifo and calling back into
the cxl_core is cxl_core has a parallel universe of software objects
('struct cxl_port') that can experience errors independent of the errors
the PCIe core cares about. It also has a cxl_port driver model that
knows the lifetime of when RAS registers are mapped that the PCIe AER
core can not know about.

So, the PCIe core has already done the device lookup before this point.
Just pass that device to the cxl_core directly, and then use that
device to lookup a cxl_port and/or cxl_dport directly.

A useful property of passing a 'struct device *' to identify the error
source device is that it supports cxl_test emulation of CXL port
protocol error injection.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ