[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250612120436.00005af5@huawei.com>
Date: Thu, 12 Jun 2025 12:04:36 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Terry Bowman <terry.bowman@....com>
CC: <PradeepVineshReddy.Kodamati@....com>, <dave@...olabs.net>,
<dave.jiang@...el.com>, <alison.schofield@...el.com>,
<vishal.l.verma@...el.com>, <ira.weiny@...el.com>,
<dan.j.williams@...el.com>, <bhelgaas@...gle.com>, <bp@...en8.de>,
<ming.li@...omail.com>, <shiju.jose@...wei.com>, <dan.carpenter@...aro.org>,
<Smita.KoralahalliChannabasappa@....com>, <kobayashi.da-06@...itsu.com>,
<yanfei.xu@...el.com>, <rrichter@....com>, <peterz@...radead.org>,
<colyli@...e.de>, <uaisheng.ye@...el.com>,
<fabio.m.de.francesco@...ux.intel.com>, <ilpo.jarvinen@...ux.intel.com>,
<yazen.ghannam@....com>, <linux-cxl@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>
Subject: Re: [PATCH v9 03/16] CXL/AER: Introduce kfifo for forwarding CXL
errors
On Tue, 3 Jun 2025 12:22:26 -0500
Terry Bowman <terry.bowman@....com> wrote:
> CXL error handling will soon be moved from the AER driver into the CXL
> driver. This requires a notification mechanism for the AER driver to share
> the AER interrupt with the CXL driver. The notification will be used
> as an indication for the CXL drivers to handle and log the CXL RAS errors.
>
> Add a kfifo work queue to be used by the AER driver and CXL driver. The AER
> driver will be the sole kfifo producer adding work and the cxl_core will be
> the sole kfifo consumer removing work. Add the boilerplate kfifo support.
>
> Add CXL work queue handler registration functions in the AER driver. Export
> the functions allowing CXL driver to access. Implement registration
> functions for the CXL driver to assign or clear the work handler function.
>
> Introduce function cxl_create_prot_err_info() and 'struct cxl_prot_err_info'.
> Implement cxl_create_prot_err_info() to populate a 'struct cxl_prot_err_info'
> instance with the AER severity and the erring device's PCI SBDF. The SBDF
> details will be used to rediscover the erring device after the CXL driver
> dequeues the kfifo work. The device rediscovery will be introduced along
> with the CXL handling in future patches.
>
> Signed-off-by: Terry Bowman <terry.bowman@....com>
Hi Terry,
A few trivial bits of feedback. I didn't find anything substantial that
hasn't already been covered by others.
Jonathan
> ---
> drivers/cxl/core/ras.c | 31 +++++++++-
> drivers/cxl/cxlpci.h | 1 +
> drivers/pci/pcie/aer.c | 132 ++++++++++++++++++++++++++++-------------
> include/linux/aer.h | 36 +++++++++++
> 4 files changed, 157 insertions(+), 43 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 485a831695c7..d35525e79e04 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -5,6 +5,7 @@
>
> void cxl_ras_exit(void)
> {
> cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> cancel_work_sync(&cxl_cper_prot_err_work);
> +
> + cxl_unregister_prot_err_work();
> + cancel_work_sync(&cxl_prot_err_work);
> }
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index adb4b1123b9b..5350fa5be784 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> +static int cxl_create_prot_error_info(struct pci_dev *pdev,
> + struct aer_err_info *aer_err_info,
> + struct cxl_prot_error_info *cxl_err_info)
> +{
> + cxl_err_info->severity = aer_err_info->severity;
> +
> + cxl_err_info->function = PCI_FUNC(pdev->devfn);
> + cxl_err_info->device = PCI_SLOT(pdev->devfn);
> + cxl_err_info->bus = pdev->bus->number;
> + cxl_err_info->segment = pci_domain_nr(pdev->bus);
Maybe
*cxl_err_info = (struct cxl_prot_error_info) {
.severity = aer_err_info->severity,
.function = PCI_FUNC(pdev->devfn),
.device = PCI_SLOT(pdev->devfn),
.bus = pdev->bus_number,
.segment = pci_domain_nr(dev->nbus),
};
Or if it isn't going to get more use later, just put that assignment in
forward_cxl_error as this helper doesn't seem to add a huge amount of
value wrt to code reability.
> +
> + return 0;
> +}
> +
> +static void forward_cxl_error(struct pci_dev *pdev, struct aer_err_info *aer_err_info)
> +{
> + struct cxl_prot_err_work_data wd;
> + struct cxl_prot_error_info *cxl_err_info = &wd.err_info;
> +
> + cxl_create_prot_error_info(pdev, aer_err_info, cxl_err_info);
cxl_create_prot_error_info(pdev, aer_err_info, &wd.err_info);
Ignore if this gets more complicated in a later patch but for now I don't
see the local variable adding any value. If anything it slightly obscures
how this gets used via wd in the next line. However given the code is short
that is fairly obvious either way.
Or as above
wd.error_info = (struct cxl_prot_error_info) {
.severity = ...
};
> +
> + if (!kfifo_put(&cxl_prot_err_fifo, wd)) {
> + dev_err_ratelimited(&pdev->dev, "CXL kfifo overflow\n");
> + return;
> + }
> +
> + schedule_work(cxl_prot_err_work);
> +}
> +
> #else
> static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { }
> static inline void cxl_rch_handle_error(struct pci_dev *dev,
> struct aer_err_info *info) { }
> +static inline void forward_cxl_error(struct pci_dev *dev,
> + struct aer_err_info *info) { }
Is this aligned right - looks one space short?
> +static inline bool handles_cxl_errors(struct pci_dev *dev)
> +{
> + return false;
> +}
> +static bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return 0; };
wrap choices for the two stubs are a bit odd. The first one is actually shorter
if not wrapped than the second one is that you chose not to wrap.
My slightly preference would be to wrap them both as the fist one.
> #endif
Powered by blists - more mailing lists