lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6af9d0a2-97fe-4296-9ceb-bf9e7fc91d5a@amd.com>
Date: Thu, 24 Oct 2024 10:20:03 -0500
From: "Bowman, Terry" <terry.bowman@....com>
To: Dan Williams <dan.j.williams@...el.com>, ming4.li@...el.com,
 linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-pci@...r.kernel.org, dave@...olabs.net, jonathan.cameron@...wei.com,
 dave.jiang@...el.com, alison.schofield@...el.com, vishal.l.verma@...el.com,
 bhelgaas@...gle.com, mahesh@...ux.ibm.com, oohall@...il.com,
 Benjamin.Cheatham@....com, rrichter@....com, nathan.fontenot@....com,
 smita.koralahallichannabasappa@....com
Subject: Re: [PATCH 01/15] cxl/aer/pci: Add CXL PCIe port error handler
 callbacks in AER service driver

Hi Dan,

I added a question below.

On 10/22/2024 6:43 PM, Dan Williams wrote:
> Terry Bowman wrote:
> [..]
>> I was referring to reusing separate instance of 'struct pci_error_handlers' for CXL
>> UCE-CE errors.
>>
>> One example where it can be reused in infrastructure is in err.c's
>> report_error_detected(). If both PCIe and CXL errors use 'struct pci_error_handlers'
>> then the updated report_error_detected() becomes a bit simpler with less helper
>> function logic.
> report_error_detected() is concerned with link and i/o state
> (pci_dev_is_disconnected() and pci_dev_set_io_state()). For device
> disconnects, CXL recovery potentially needs to span multiple devices.
> For i/o state, CXL.io could be fully operational while CXL.cache and
> CXL.mem are in fatal state.
>
> CXL considerations do not feel welcome in that function.
>
> Ideally a PCIe developer never needs to see or understand the CXL error
> model because it is off in its own path. In other words, if someone
> maintaining pcie_do_recovery=>report_error_detected() for the PCIe case
> needs to go find a CXL expert each time they want to touch that path,
> that feels like a regression in PCIe error handling maintainability.
>
>> But, it's not a reason by itself to choose to reuse 'struct
>> pci_error_handlers' for CXL errors.
>>
>> Looking closer at aer,c shows there is no advantage in this file for using 'struct
>> pci_error_handlers' for CXL errors.
>>
>> If I understand correctly you want a new type introduced, 'struct cxl_error_handlers'.
> Yes, mainly because the bus state and the result of the recovery tend to
> be a different operational model. If a CXL error fits the PCIe model
> then it can be sent via pcie_do_recovery(), but I expect that only
> applies to a handful of correctable errors like CRC_Threshold,
> Retry_Threshold, or Physical_Layer_Error. Almost everything else *seems*
> like it has a CXL specific response that would confuse
> pcie_do_recovery(). 
>
> So, in general new operational models == new data structures and types.

Would you like to continue to use the pci_error_handlers for the CXL PCIe 
endpoint device driver? Or do we change the CXL PCIe endpoint driver to 
use the cxl_error_handlers ?
Regards,
Terry


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ