[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <645adbb6-096f-4af3-9609-ddc5a6f5239a@linux.alibaba.com>
Date: Mon, 20 Oct 2025 22:45:31 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: Lukas Wunner <lukas@...ner.de>
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
linuxppc-dev@...ts.ozlabs.org, bhelgaas@...gle.com, kbusch@...nel.org,
sathyanarayanan.kuppuswamy@...ux.intel.com, mahesh@...ux.ibm.com,
oohall@...il.com, Jonathan.Cameron@...wei.com, terry.bowman@....com,
tianruidong@...ux.alibaba.com
Subject: Re: [PATCH v6 4/5] PCI/ERR: Use pcie_aer_is_native() to check for
native AER control
在 2025/10/20 21:58, Lukas Wunner 写道:
> On Mon, Oct 20, 2025 at 09:09:41PM +0800, Shuai Xue wrote:
>> ??? 2025/10/20 18:17, Lukas Wunner ??????:
>>> On Wed, Oct 15, 2025 at 10:41:58AM +0800, Shuai Xue wrote:
>>>> Replace the manual checks for native AER control with the
>>>> pcie_aer_is_native() helper, which provides a more robust way
>>>> to determine if we have native control of AER.
>>>
>>> Why is it more robust?
>>
>> IMHO, the pcie_aer_is_native() helper is more robust because it includes
>> additional safety checks that the manual approach lacks:
> [...]
>> Specifically, it performs a sanity check for dev->aer_cap before
>> evaluating native AER control.
>
> I'm under the impression that aer_cap must be set, otherwise the
> error wouldn't have been reported and we wouldn't be in this code path?
>
> If we can end up in this code path without aer_cap set, your patch
> would regress devices which are not AER-capable because it would
> now skip clearing of errors in the Device Status register via
> pcie_clear_device_status().
Hi Lukas,
You raise an excellent point about the potential regression.
The origin code is:
if (host->native_aer || pcie_ports_native) {
pcie_clear_device_status(bridge);
pci_aer_clear_nonfatal_status(bridge);
}
This code clears both the PCIe Device Status register and AER status
registers when in native AER mode.
pcie_clear_device_status() is renamed from
pci_aer_clear_device_status(). Does it intends to clear only AER error
status?
- BIT 0: Correctable Error Detected
- BIT 1: Non-Fatal Error Detected
- BIT 2: Fatal Error Detected
- BIT 3: Unsupported Request Detected
From PCIe spec, BIT 0-2 are logged for functions supporting Advanced
Error Handling.
I am not sure if we should clear BIT 3, and also BIT 6 (Emergency Power
Reduction Detected) and in case a AER error.
>
> Thanks,
>
> Lukas
Thanks.
Shuai
Powered by blists - more mailing lists