lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <110a5a5c-6842-4d1d-ae7e-22ffd0111641@amd.com>
Date: Thu, 12 Jun 2025 09:29:19 -0500
From: "Bowman, Terry" <terry.bowman@....com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: PradeepVineshReddy.Kodamati@....com, dave@...olabs.net,
 dave.jiang@...el.com, alison.schofield@...el.com, vishal.l.verma@...el.com,
 ira.weiny@...el.com, dan.j.williams@...el.com, bhelgaas@...gle.com,
 bp@...en8.de, ming.li@...omail.com, shiju.jose@...wei.com,
 dan.carpenter@...aro.org, Smita.KoralahalliChannabasappa@....com,
 kobayashi.da-06@...itsu.com, rrichter@....com, peterz@...radead.org,
 fabio.m.de.francesco@...ux.intel.com, ilpo.jarvinen@...ux.intel.com,
 yazen.ghannam@....com, linux-cxl@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org
Subject: Re: [PATCH v9 03/16] CXL/AER: Introduce kfifo for forwarding CXL
 errors



On 6/12/2025 6:04 AM, Jonathan Cameron wrote:
> On Tue, 3 Jun 2025 12:22:26 -0500
> Terry Bowman <terry.bowman@....com> wrote:
>
>> CXL error handling will soon be moved from the AER driver into the CXL
>> driver. This requires a notification mechanism for the AER driver to share
>> the AER interrupt with the CXL driver. The notification will be used
>> as an indication for the CXL drivers to handle and log the CXL RAS errors.
>>
>> Add a kfifo work queue to be used by the AER driver and CXL driver. The AER
>> driver will be the sole kfifo producer adding work and the cxl_core will be
>> the sole kfifo consumer removing work. Add the boilerplate kfifo support.
>>
>> Add CXL work queue handler registration functions in the AER driver. Export
>> the functions allowing CXL driver to access. Implement registration
>> functions for the CXL driver to assign or clear the work handler function.
>>
>> Introduce function cxl_create_prot_err_info() and 'struct cxl_prot_err_info'.
>> Implement cxl_create_prot_err_info() to populate a 'struct cxl_prot_err_info'
>> instance with the AER severity and the erring device's PCI SBDF. The SBDF
>> details will be used to rediscover the erring device after the CXL driver
>> dequeues the kfifo work. The device rediscovery will be introduced along
>> with the CXL handling in future patches.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@....com>
> Hi Terry,
>
> A few trivial bits of feedback.  I didn't find anything substantial that
> hasn't already been covered by others.
>
> Jonathan
>
>> ---
>>  drivers/cxl/core/ras.c |  31 +++++++++-
>>  drivers/cxl/cxlpci.h   |   1 +
>>  drivers/pci/pcie/aer.c | 132 ++++++++++++++++++++++++++++-------------
>>  include/linux/aer.h    |  36 +++++++++++
>>  4 files changed, 157 insertions(+), 43 deletions(-)
>>
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index 485a831695c7..d35525e79e04 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
>> @@ -5,6 +5,7 @@
>>  
>>  void cxl_ras_exit(void)
>>  {
>>  	cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
>>  	cancel_work_sync(&cxl_cper_prot_err_work);
>> +
>> +	cxl_unregister_prot_err_work();
>> +	cancel_work_sync(&cxl_prot_err_work);
>>  }
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index adb4b1123b9b..5350fa5be784 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> +static int cxl_create_prot_error_info(struct pci_dev *pdev,
>> +				      struct aer_err_info *aer_err_info,
>> +				      struct cxl_prot_error_info *cxl_err_info)
>> +{
>> +	cxl_err_info->severity = aer_err_info->severity;
>> +
>> +	cxl_err_info->function = PCI_FUNC(pdev->devfn);
>> +	cxl_err_info->device = PCI_SLOT(pdev->devfn);
>> +	cxl_err_info->bus = pdev->bus->number;
>> +	cxl_err_info->segment = pci_domain_nr(pdev->bus);
> Maybe 
> 	*cxl_err_info = (struct cxl_prot_error_info) {
> 		.severity = aer_err_info->severity,
> 		.function = PCI_FUNC(pdev->devfn),
> 		.device = PCI_SLOT(pdev->devfn),
> 		.bus = pdev->bus_number,
> 		.segment = pci_domain_nr(dev->nbus),
> 	};
>
Ok. And I'll take your other recommendation from patch [4/16] to not split devfn.
> Or if it isn't going to get more use later, just put that assignment in
> forward_cxl_error as this helper doesn't seem to add a huge amount of
> value wrt to code reability.
>
Good idea. This function started out doing more work but has been simplified that
it no longer needs be in a helper function.

>> +
>> +	return 0;
>> +}
>> +
>> +static void forward_cxl_error(struct pci_dev *pdev, struct aer_err_info *aer_err_info)
>> +{
>> +	struct cxl_prot_err_work_data wd;
>> +	struct cxl_prot_error_info *cxl_err_info = &wd.err_info;
>> +
>> +	cxl_create_prot_error_info(pdev, aer_err_info, cxl_err_info);
> 	cxl_create_prot_error_info(pdev, aer_err_info, &wd.err_info);
>
> Ignore if this gets more complicated in a later patch but for now I don't
> see the local variable adding any value. If anything it slightly obscures
> how this gets used via wd in the next line.  However given the code is short
> that is fairly obvious either way.
>
> Or as above
> 	wd.error_info = (struct cxl_prot_error_info) {
> 		.severity = ...
> 	};
Ok. I'll let you know if it doesn't work. ;)
>> +
>> +	if (!kfifo_put(&cxl_prot_err_fifo, wd)) {
>> +		dev_err_ratelimited(&pdev->dev, "CXL kfifo overflow\n");
>> +		return;
>> +	}
>> +
>> +	schedule_work(cxl_prot_err_work);
>> +}
>> +
>>  #else
>>  static inline void cxl_rch_enable_rcec(struct pci_dev *dev) { }
>>  static inline void cxl_rch_handle_error(struct pci_dev *dev,
>>  					struct aer_err_info *info) { }
>> +static inline void forward_cxl_error(struct pci_dev *dev,
>> +				    struct aer_err_info *info) { }
> Is this aligned right - looks one space short?
>
>> +static inline bool handles_cxl_errors(struct pci_dev *dev)
>> +{
>> +	return false;
>> +}
>> +static bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info) { return 0; };
> wrap choices for the two stubs are a bit odd.   The first one is actually shorter
> if not wrapped than the second one is that you chose not to wrap.
> My slightly preference would be to wrap them both as the fist one.
>
Thanks for pointing out. Fortunately, this (and above alignment note) is moved into a new file
pci/pcie/cxl_aer.c in the next revision and no longer requires this source file #ifdef.

-Terry

>>  #endif
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ