lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c7c9d417-5c32-4354-825e-58f736726114@amd.com>
Date: Thu, 21 Nov 2024 14:24:17 -0600
From: "Bowman, Terry" <terry.bowman@....com>
To: Lukas Wunner <lukas@...ner.de>
Cc: linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-pci@...r.kernel.org, nifan.cxl@...il.com, ming4.li@...el.com,
 dave@...olabs.net, jonathan.cameron@...wei.com, dave.jiang@...el.com,
 alison.schofield@...el.com, vishal.l.verma@...el.com,
 dan.j.williams@...el.com, bhelgaas@...gle.com, mahesh@...ux.ibm.com,
 ira.weiny@...el.com, oohall@...il.com, Benjamin.Cheatham@....com,
 rrichter@....com, nathan.fontenot@....com,
 Smita.KoralahalliChannabasappa@....com,
 Shuai Xue <xueshuai@...ux.alibaba.com>, Keith Busch <kbusch@...nel.org>
Subject: Re: [PATCH v3 06/15] PCI/AER: Change AER driver to read UCE fatal
 status for all CXL PCIe port devices



On 11/15/2024 3:35 AM, Lukas Wunner wrote:
> On Wed, Nov 13, 2024 at 03:54:20PM -0600, Terry Bowman wrote:
>> The AER service driver's aer_get_device_error_info() function doesn't read
>> uncorrectable (UCE) fatal error status from PCIe upstream port devices,
>> including CXL upstream switch ports. As a result, fatal errors are not
>> logged or handled as needed for CXL PCIe upstream switch port devices.
>>
>> Update the aer_get_device_error_info() function to read the UCE fatal
>> status for all CXL PCIe port devices. Make the change to not affect
>> non-CXL PCIe devices.
>>
>> The fatal error status will be used in future patches implementing
>> CXL PCIe port uncorrectable error handling and logging.
> [...]
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -1250,7 +1250,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
>>  	} else if (type == PCI_EXP_TYPE_ROOT_PORT ||
>>  		   type == PCI_EXP_TYPE_RC_EC ||
>>  		   type == PCI_EXP_TYPE_DOWNSTREAM ||
>> -		   info->severity == AER_NONFATAL) {
>> +		   info->severity == AER_NONFATAL ||
>> +		   (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) {
>>  
>>  		/* Link is still healthy for IO reads */
>>  		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,
> Just a heads-up, there's another patch pending by Shuai Xue (+cc)
> which touches the same code lines.  It re-enables error reporting
> for PCIe Upstream Ports (as well as Endpoints) under certain
> conditions:
>
> https://lore.kernel.org/all/20241112135419.59491-3-xueshuai@linux.alibaba.com/
>
> That was originally disabled by Keith Busch (+cc) with commit
> 9d938ea53b26 ("PCI/AER: Don't read upstream ports below fatal errors").
>
> There's some merge conflict potential here if your series goes into
> the cxl tree and Shuai's patch into the pci tree in the next cycle.
>
> Thanks,
>
> Lukas
Thanks Lukas I took a look at the patchset and reached out to Shuai (you're CC'd). Sorry, I thought
I responded here earlier.

Regards,
Terry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ