[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <110a5a5c-6842-4d1d-ae7e-22ffd0111641@amd.com>
Date: Thu, 12 Jun 2025 09:29:19 -0500
From: "Bowman, Terry" <terry.bowman@....com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: PradeepVineshReddy.Kodamati@....com, dave@...olabs.net,
dave.jiang@...el.com, alison.schofield@...el.com, vishal.l.verma@...el.com,
ira.weiny@...el.com, dan.j.williams@...el.com, bhelgaas@...gle.com,
bp@...en8.de, ming.li@...omail.com, shiju.jose@...wei.com,
dan.carpenter@...aro.org, Smita.KoralahalliChannabasappa@....com,
kobayashi.da-06@...itsu.com, rrichter@....com, peterz@...radead.org,
fabio.m.de.francesco@...ux.intel.com, ilpo.jarvinen@...ux.intel.com,
yazen.ghannam@....com, linux-cxl@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: [PATCH v9 03/16] CXL/AER: Introduce kfifo for forwarding CXL
errors
On 6/12/2025 6:04 AM, Jonathan Cameron wrote:
> On Tue, 3 Jun 2025 12:22:26 -0500
> Terry Bowman <terry.bowman@....com> wrote:
>
>> CXL error handling will soon be moved from the AER driver into the CXL
>> driver. This requires a notification mechanism for the AER driver to share
>> the AER interrupt with the CXL driver. The notification will be used
>> as an indication for the CXL drivers to handle and log the CXL RAS errors.
>>
>> Add a kfifo work queue to be used by the AER driver and CXL driver. The AER
>> driver will be the sole kfifo producer adding work and the cxl_core will be
>> the sole kfifo consumer removing work. Add the boilerplate kfifo support.
>>
>> Add CXL work queue handler registration functions in the AER driver. Export
>> the functions allowing CXL driver to access. Implement registration
>> functions for the CXL driver to assign or clear the work handler function.
>>
>> Introduce function cxl_create_prot_err_info() and 'struct cxl_prot_err_info'.
>> Implement cxl_create_prot_err_info() to populate a 'struct cxl_prot_err_info'
>> instance with the AER severity and the erring device's PCI SBDF. The SBDF
>> details will be used to rediscover the erring device after the CXL driver
>> dequeues the kfifo work. The device rediscovery will be introduced along
>> with the CXL handling in future patches.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@....com>
> Hi Terry,
>
> A few trivial bits of feedback. I didn't find anything substantial that
> hasn't already been covered by others.
>
> Jonathan
>
>> ---
>> drivers/cxl/core/ras.c | 31 +++++++++-
>> drivers/cxl/cxlpci.h | 1 +
>> drivers/pci/pcie/aer.c | 132 ++++++++++++++++++++++++++++-------------
>> include/linux/aer.h | 36 +++++++++++
>> 4 files changed, 157 insertions(+), 43 deletions(-)
>>
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index 485a831695c7..d35525e79e04 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
>> @@ -5,6 +5,7 @@
>>
>> void cxl_ras_exit(void)
>> {
>> cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
>> cancel_work_sync(&cxl_cper_prot_err_work);
>> +
>> + cxl_unregister_prot_err_work();
>> + cancel_work_sync(&cxl_prot_err_work);
>> }
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index adb4b1123b9b..5350fa5be784 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> +static int cxl_create_prot_error_info(struct pci_dev *pdev,
>> + struct aer_err_info *aer_err_info,
>> + struct cxl_prot_error_info *cxl_err_info)
>> +{
>> + cxl_err_info->severity = aer_err_info->severity;
>> +
>> + cxl_err_info->function = PCI_FUNC(pdev->devfn);
>> + cxl_err_info->device = PCI_SLOT(pdev->devfn);
>> + cxl_err_info->bus = pdev->bus->number;
>> + cxl_err_info->segment = pci_domain_nr(pdev->bus);
> Maybe
> *cxl_err_info = (struct cxl_prot_error_info) {
> .severity = aer_err_info->severity,
> .function = PCI_FUNC(pdev->devfn),
> .device = PCI_SLOT(pdev->devfn),
> .bus = pdev->bus_number,
> .segment = pci_domain_nr(dev->nbus),
> };
>
Ok. And I'll take your other recommendation from patch [4/16] to not split devfn.
> Or if it isn't going to get more use later, just put that assignment in
> forward_cxl_error as this helper doesn't seem to add a huge amount of
> value wrt to code reability.
>
Good idea. This function started out doing more work but has been simplified that
it no longer needs be in a helper function.
>> +
>> + return 0;
>> +}
>> +
>> +static void forward_cxl_error(struct pci_dev *pdev, struct aer_err_info *aer_err_info)
>> +{
>> + struct cxl_prot_err_work_data wd;
>> + struct cxl_prot_error_info *cxl_err_info = &wd.err_info;
>> +
>> + cxl_create_prot_error_info(pdev, aer_err_info, cxl_err_info);
> cxl_create_prot_error_info(pdev, aer_err_info, &wd.err_info);
>
> Ignore if this gets more complicated in a later patch but for now I don't
> see the local variable adding any value. If anything it slightly obscures
> how this gets used via wd in the next line. However given the code is short
> that is fairly obvious either way.
>
> Or as above
> wd.error_info = (struct cxl_prot_error_info) {
> .severity = ...
> };
Ok. I'll let you know if it doesn't work. ;)
>> +
>> + if (!kfifo_put(&cxl_prot_err_fifo, wd)) {
>> + dev_err_ratelimited(&pdev->dev, "CXL kfifo overflow\n");
>> + return;
>> + }
>> +
>> + schedule_work(cxl_prot_err_work);
>> +}
>> +
>> #else
>> static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { }
>> static inline void cxl_rch_handle_error(struct pci_dev *dev,
>> struct aer_err_info *info) { }
>> +static inline void forward_cxl_error(struct pci_dev *dev,
>> + struct aer_err_info *info) { }
> Is this aligned right - looks one space short?
>
>> +static inline bool handles_cxl_errors(struct pci_dev *dev)
>> +{
>> + return false;
>> +}
>> +static bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return 0; };
> wrap choices for the two stubs are a bit odd. The first one is actually shorter
> if not wrapped than the second one is that you chose not to wrap.
> My slightly preference would be to wrap them both as the fist one.
>
Thanks for pointing out. Fortunately, this (and above alignment note) is moved into a new file
pci/pcie/cxl_aer.c in the next revision and no longer requires this source file #ifdef.
-Terry
>> #endif
>
Powered by blists - more mailing lists