lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0cceca3d-f69e-4277-bc9f-2556fd212ebb@amd.com>
Date: Tue, 22 Oct 2024 08:50:19 -0500
From: Terry Bowman <Terry.Bowman@....com>
To: Dan Williams <dan.j.williams@...el.com>, ming4.li@...el.com,
 linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-pci@...r.kernel.org, dave@...olabs.net, jonathan.cameron@...wei.com,
 dave.jiang@...el.com, alison.schofield@...el.com, vishal.l.verma@...el.com,
 bhelgaas@...gle.com, mahesh@...ux.ibm.com, oohall@...il.com,
 Benjamin.Cheatham@....com, rrichter@....com, nathan.fontenot@....com,
 smita.koralahallichannabasappa@....com
Subject: Re: [PATCH 01/15] cxl/aer/pci: Add CXL PCIe port error handler
 callbacks in AER service driver

Hi Dan,

On 10/21/24 20:53, Dan Williams wrote:
> Terry Bowman wrote:
>> CXL protocol errors are reported to the OS through PCIe correctable and
>> uncorrectable internal errors. However, since CXL PCIe port devices
>> are currently bound to the portdrv driver, there is no mechanism to
>> notify the CXL driver, which is necessary for proper logging and
>> handling.
>>
>> To address this, introduce CXL PCIe port error callbacks along with
>> register/unregister and accessor functions. The callbacks will be
>> invoked by the AER driver in the case protocol errors are reported by
>> a CXL port device.
>>
>> The AER driver callbacks will be used in future patches implementing
>> CXL PCIe port error handling.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@....com>
>> ---
>>  drivers/pci/pcie/aer.c | 22 ++++++++++++++++++++++
>>  include/linux/aer.h    | 14 ++++++++++++++
>>  2 files changed, 36 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index 13b8586924ea..a9792b9576b4 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -50,6 +50,8 @@ struct aer_rpc {
>>  	DECLARE_KFIFO(aer_fifo, struct aer_err_source, AER_ERROR_SOURCES_MAX);
>>  };
>>  
>> +static struct cxl_port_err_hndlrs cxl_port_hndlrs;
> 
> I think this can afford to splurge on a few more letters and make this
> 
> static struct cxl_port_error_handlers cxl_port_error_handlers;
> 
> 

Ok.

>> +
>>  /* AER stats for the device */
>>  struct aer_stats {
>>  
>> @@ -1078,6 +1080,26 @@ static inline void cxl_rch_handle_error(struct pci_dev *dev,
>>  					struct aer_err_info *info) { }
>>  #endif
>>  
>> +void register_cxl_port_hndlrs(struct cxl_port_err_hndlrs *_cxl_port_hndlrs)
>> +{
>> +	cxl_port_hndlrs.error_detected = _cxl_port_hndlrs->error_detected;
>> +	cxl_port_hndlrs.cor_error_detected = _cxl_port_hndlrs->cor_error_detected;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(register_cxl_port_hndlrs, CXL);
>> +
>> +void unregister_cxl_port_hndlrs(void)
>> +{
>> +	cxl_port_hndlrs.error_detected = NULL;
>> +	cxl_port_hndlrs.cor_error_detected = NULL;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(unregister_cxl_port_hndlrs, CXL);
>> +
>> +struct cxl_port_err_hndlrs *find_cxl_port_hndlrs(void)
>> +{
>> +	return &cxl_port_hndlrs;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(find_cxl_port_hndlrs, CXL);
> 
> I guess I will need to go deeper into the code, but I would not have
> expected that new registration interfaces are needed. Each 'struct
> pci_driver' could optionally include CXL error handlers alongside their
> PCIe error handlers and when CXL AER errors are broadcast only the CXL
> handlers are invoked. I.e. the registration is something like:
> 
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 6af5e0425872..42db26195bda 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -793,6 +793,7 @@ static struct pci_driver pcie_portdriver = {
>         .shutdown       = pcie_portdrv_shutdown,
>  
>         .err_handler    = &pcie_portdrv_err_handler,
> +       .cxl_err_handler = &cxl_portdrv_err_handler,
>  
>         .driver_managed_dma = true,

Ok. I'm thinking to add a definition for 'pci_dev::cxl_err_handler' of type 
'struct pci_error_handler'. 

'struct pci_error_handler' contains a slot reset(), resume(), and mmio_enabled() fn 
pointers that are used in PCIe recovery if available. The plan is for CXL devices to
call panic for UCE fatal and non-fatal but it might be good to use the 
'struct pci_error_handler' type in case there are needs for the other handlers in 
the future. It also makes the logic to access and use the error handlers common, 
requiring less code.

Regards,
Terry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ