lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <447e3e84-af3b-415b-8f01-f3c60fe31ce3@amd.com>
Date: Wed, 2 Jul 2025 16:06:53 -0500
From: "Bowman, Terry" <terry.bowman@....com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: dave@...olabs.net, dave.jiang@...el.com, alison.schofield@...el.com,
 dan.j.williams@...el.com, bhelgaas@...gle.com, shiju.jose@...wei.com,
 ming.li@...omail.com, Smita.KoralahalliChannabasappa@....com,
 rrichter@....com, dan.carpenter@...aro.org,
 PradeepVineshReddy.Kodamati@....com, lukas@...ner.de,
 Benjamin.Cheatham@....com, sathyanarayanan.kuppuswamy@...ux.intel.com,
 linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-pci@...r.kernel.org
Subject: Re: [PATCH v10 07/17] CXL/PCI: Introduce CXL uncorrectable protocol
 error recovery



On 6/27/2025 6:05 AM, Jonathan Cameron wrote:
> On Thu, 26 Jun 2025 17:42:42 -0500
> Terry Bowman <terry.bowman@....com> wrote:
>
>> Create cxl_do_recovery() to provide uncorrectable protocol error (UCE)
>> handling. Follow similar design as found in PCIe error driver,
>> pcie_do_recovery(). One difference is cxl_do_recovery() will treat all UCEs
>> as fatal with a kernel panic. This is to prevent corruption on CXL memory.
>>
>> Export the PCI error driver's merge_result() to CXL namespace.
> I think this may be a confusion from earlier review.  Anyhow, it should
> be namespaced in the sense of not exporting something the vague name of
> merge_result but it's PCI code, not CXL code and we don't have the dangerous
> interface argument to justify putting it in the CXL namespace so I think
> a namespaced EXPORT makes little sense for this one.
>
> Jonathan
>
>
>> Introduce
>> PCI_ERS_RESULT_PANIC and add support in merge_result() routine. This will
>> be used by CXL to panic the system in the case of uncorrectable protocol
>> errors. PCI error handling is not currently expected to use the
>> PCI_ERS_RESULT_PANIC.
>>
>> Copy pci_walk_bridge() to cxl_walk_bridge(). Make a change to walk the
>> first device in all cases.
>>
>> Copy the PCI error driver's report_error_detected() to cxl_report_error_detected().
>> Note, only CXL Endpoints and RCH Downstream Ports(RCH DSP) are currently
>> supported. Add locking for PCI device as done in PCI's report_error_detected().
>> This is necessary to prevent the RAS registers from disappearing before
>> logging is completed.
>>
>> Call panic() to halt the system in the case of uncorrectable errors (UCE)
>> in cxl_do_recovery(). Export pci_aer_clear_fatal_status() for CXL to use
>> if a UCE is not found. In this case the AER status must be cleared and
>> uses pci_aer_clear_fatal_status().
>>
>> Signed-off-by: Terry Bowman <terry.bowman@....com>
>
>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>> index de6381c690f5..63fceb3e8613 100644
>> --- a/drivers/pci/pcie/err.c
>> +++ b/drivers/pci/pcie/err.c
>> @@ -21,9 +21,12 @@
>>  #include "portdrv.h"
>>  #include "../pci.h"
>>  
>> -static pci_ers_result_t merge_result(enum pci_ers_result orig,
>> -				  enum pci_ers_result new)
>> +pci_ers_result_t merge_result(enum pci_ers_result orig,
>> +			      enum pci_ers_result new)
>>  {
>> +	if (new == PCI_ERS_RESULT_PANIC)
>> +		return PCI_ERS_RESULT_PANIC;
>> +
>>  	if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
>>  		return PCI_ERS_RESULT_NO_AER_DRIVER;
>>  
>> @@ -45,6 +48,7 @@ static pci_ers_result_t merge_result(enum pci_ers_result orig,
>>  
>>  	return orig;
>>  }
>> +EXPORT_SYMBOL_NS_GPL(merge_result, "CXL"); 
> Do we care about namespacing this?  I think not given it is PCIe code
> and hardly destructive for other drivers to mess with it if they like.
>
> I would namespace it in the sense of renaming it to make it clear
> it's about pci errors though.
>
> pci_ers_merge_result() perhaps?
>
> Do that as a percursor patch.
>

Good idea. There is a lot of changes related to just exporting this and changing
the name. I've changed the namespace export to be:

EXPORT_SYMBOL(pci_ers_merge_result);

I moved this and its related required changes into an earlier patch.

-Terry

>>  
>>  static int report_error_detected(struct pci_dev *dev,
>>  				 pci_channel_state_t state,


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ