lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <645adbb6-096f-4af3-9609-ddc5a6f5239a@linux.alibaba.com>
Date: Mon, 20 Oct 2025 22:45:31 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: Lukas Wunner <lukas@...ner.de>
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
 linuxppc-dev@...ts.ozlabs.org, bhelgaas@...gle.com, kbusch@...nel.org,
 sathyanarayanan.kuppuswamy@...ux.intel.com, mahesh@...ux.ibm.com,
 oohall@...il.com, Jonathan.Cameron@...wei.com, terry.bowman@....com,
 tianruidong@...ux.alibaba.com
Subject: Re: [PATCH v6 4/5] PCI/ERR: Use pcie_aer_is_native() to check for
 native AER control



在 2025/10/20 21:58, Lukas Wunner 写道:
> On Mon, Oct 20, 2025 at 09:09:41PM +0800, Shuai Xue wrote:
>> ??? 2025/10/20 18:17, Lukas Wunner ??????:
>>> On Wed, Oct 15, 2025 at 10:41:58AM +0800, Shuai Xue wrote:
>>>> Replace the manual checks for native AER control with the
>>>> pcie_aer_is_native() helper, which provides a more robust way
>>>> to determine if we have native control of AER.
>>>
>>> Why is it more robust?
>>
>> IMHO, the pcie_aer_is_native() helper is more robust because it includes
>> additional safety checks that the manual approach lacks:
> [...]
>> Specifically, it performs a sanity check for dev->aer_cap before
>> evaluating native AER control.
> 
> I'm under the impression that aer_cap must be set, otherwise the
> error wouldn't have been reported and we wouldn't be in this code path?
> 
> If we can end up in this code path without aer_cap set, your patch
> would regress devices which are not AER-capable because it would
> now skip clearing of errors in the Device Status register via
> pcie_clear_device_status().

Hi Lukas,

You raise an excellent point about the potential regression.

The origin code is:

	if (host->native_aer || pcie_ports_native) {
		pcie_clear_device_status(bridge);
		pci_aer_clear_nonfatal_status(bridge);
	}

This code clears both the PCIe Device Status register and AER status
registers when in native AER mode.

pcie_clear_device_status() is renamed from
pci_aer_clear_device_status(). Does it intends to clear only AER error
status?

- BIT 0: Correctable Error Detected
- BIT 1: Non-Fatal Error Detected
- BIT 2: Fatal Error Detected
- BIT 3: Unsupported Request Detected

 From PCIe spec, BIT 0-2 are logged for functions supporting Advanced
Error Handling.

I am not sure if we should clear BIT 3, and also BIT 6 (Emergency Power
Reduction Detected) and in case a AER error.

> 
> Thanks,
> 
> Lukas

Thanks.
Shuai

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ