lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <691e646357da5_1a37510026@dwillia2-mobl4.notmuch>
Date: Wed, 19 Nov 2025 16:44:19 -0800
From: <dan.j.williams@...el.com>
To: Terry Bowman <terry.bowman@....com>, <dave@...olabs.net>,
	<jonathan.cameron@...wei.com>, <dave.jiang@...el.com>,
	<alison.schofield@...el.com>, <dan.j.williams@...el.com>,
	<bhelgaas@...gle.com>, <shiju.jose@...wei.com>, <ming.li@...omail.com>,
	<Smita.KoralahalliChannabasappa@....com>, <rrichter@....com>,
	<dan.carpenter@...aro.org>, <PradeepVineshReddy.Kodamati@....com>,
	<lukas@...ner.de>, <Benjamin.Cheatham@....com>,
	<sathyanarayanan.kuppuswamy@...ux.intel.com>, <linux-cxl@...r.kernel.org>,
	<alucerop@....com>, <ira.weiny@...el.com>
CC: <linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>,
	<terry.bowman@....com>
Subject: Re: [RESEND v13 16/25] CXL/AER: Introduce pcie/aer_cxl_vh.c in AER
 driver for forwarding CXL errors

Terry Bowman wrote:
> CXL virtual hierarchy (VH) RAS handling for CXL Port devices will be added
> soon. This requires a notification mechanism for the AER driver to share
> the AER interrupt with the CXL driver. The notification will be used as an
> indication for the CXL drivers to handle and log the CXL RAS errors.
> 
> Note, 'CXL protocol error' terminology will refer to CXL VH and not
> CXL RCH errors unless specifically noted going forward.
> 
> Introduce a new file in the AER driver to handle the CXL protocol errors
> named pci/pcie/aer_cxl_vh.c.
> 
> Add a kfifo work queue to be used by the AER and CXL drivers. The AER
> driver will be the sole kfifo producer adding work and the cxl_core will be
> the sole kfifo consumer removing work. Add the boilerplate kfifo support.
> Encapsulate the kfifo, RW semaphore, and work pointer in a single structure.
> 
> Add CXL work queue handler registration functions in the AER driver. Export
> the functions allowing CXL driver to access. Implement registration
> functions for the CXL driver to assign or clear the work handler function.
> Synchronize accesses using the RW semaphore.
> 
> Introduce 'struct cxl_proto_err_work_data' to serve as the kfifo work data.
> This will contain a reference to the erring PCI device and the error
> severity. This will be used when the work is dequeued by the cxl_core driver.
> 
> Signed-off-by: Terry Bowman <terry.bowman@....com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@...wei.com>
> Reviewed-by: Dave Jiang <dave.jiang@...el.com>

Some small things to fixup.

> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> new file mode 100644
> index 000000000000..5dbc81341dc4
> --- /dev/null
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2025 AMD Corporation. All rights reserved. */
> +
> +#include <linux/pci.h>
> +#include <linux/aer.h>
> +#include <linux/pci.h>
> +#include <linux/bitfield.h>
> +#include <linux/kfifo.h>
> +#include "../pci.h"
> +
> +#define CXL_ERROR_SOURCES_MAX          128
> +
> +struct cxl_proto_err_kfifo {
> +	struct work_struct *work;
> +	struct rw_semaphore rw_sema;
> +	DECLARE_KFIFO(fifo, struct cxl_proto_err_work_data,
> +		      CXL_ERROR_SOURCES_MAX);
> +};
> +
> +static struct cxl_proto_err_kfifo cxl_proto_err_kfifo = {
> +	.rw_sema = __RWSEM_INITIALIZER(cxl_proto_err_kfifo.rw_sema)
> +};
> +
> +bool cxl_error_is_native(struct pci_dev *dev)
> +{
> +	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> +
> +	return (pcie_ports_native || host->native_aer);

This function always confuses me because there is zero "cxl" inside this
function. Something to comment on later so I am not scratching my head
the next time this function is touched.

> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_error_is_native, "CXL");

Why is this exported? All of the consumers are local to
drivers/pci/pcie/built-in.a.

> +
> +bool is_internal_error(struct aer_err_info *info)
> +{
> +	if (info->severity == AER_CORRECTABLE)
> +		return info->status & PCI_ERR_COR_INTERNAL;
> +
> +	return info->status & PCI_ERR_UNC_INTN;
> +}
> +EXPORT_SYMBOL_NS_GPL(is_internal_error, "CXL");

Ditto on the export, and I do not see it getting used anywhere later in
the series.

Also, this is so tiny that if anything else wanted to use it just make
it a static inline.

> +
> +bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
> +{
> +	if (!info || !info->is_cxl)
> +		return false;
> +
> +	if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
> +		return false;
> +
> +	return is_internal_error(info);
> +}
> +EXPORT_SYMBOL_NS_GPL(is_cxl_error, "CXL");

No consumers for this exported symbol.

> +
> +void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info)
> +{
> +	struct cxl_proto_err_work_data wd = (struct cxl_proto_err_work_data) {
> +		.severity = info->severity,
> +		.pdev = pdev
> +	};
> +
> +	guard(rwsem_write)(&cxl_proto_err_kfifo.rw_sema);

This guard can be downgraded to rwsem_read. This only needs to make sure
that the kifo remain registered for the duration of the function.

> +
> +	if (!cxl_proto_err_kfifo.work) {
> +		dev_warn_once(&pdev->dev, "CXL driver is unregistered. Unable to forward error.");

I would combine this with the following ratelimited message because they
are effectively the same thing. "Hey admin, I see some errors but the
driver to handle them is gone, or out to lunch." The reason to combine
them is that you probably want this message to catch dropped errors
without failure, and this dev_warn_once() starts failing after the first
invocation.

> +		return;
> +	}
> +
> +	if (!kfifo_put(&cxl_proto_err_kfifo.fifo, wd)) {
> +		dev_err_ratelimited(&pdev->dev, "AER-CXL kfifo overflow\n");
> +		return;
> +	}
> +
> +	schedule_work(cxl_proto_err_kfifo.work);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_forward_error, "CXL");

No consumer for this export.

> +
> +void cxl_register_proto_err_work(struct work_struct *work)
> +{
> +	guard(rwsem_write)(&cxl_proto_err_kfifo.rw_sema);
> +	cxl_proto_err_kfifo.work = work;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_register_proto_err_work, "CXL");

Oh hey, the rest of these exports make sense.

...but I do think you can go back and remove

 bool is_internal_error(struct aer_err_info *info);
 bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info);
 void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info);

...from pci.h, and move them to an aer internal header like
drivers/pci/pcie/portdrv.h.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ