[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <91cf33b4-7f67-4f3a-b095-e8f04d8c18e9@linux.alibaba.com>
Date: Fri, 24 Oct 2025 11:38:10 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: Lukas Wunner <lukas@...ner.de>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
kbusch@...nel.org, sathyanarayanan.kuppuswamy@...ux.intel.com,
mahesh@...ux.ibm.com, oohall@...il.com, Jonathan.Cameron@...wei.com,
terry.bowman@....com, tianruidong@...ux.alibaba.com
Subject: Re: [PATCH v6 4/5] PCI/ERR: Use pcie_aer_is_native() to check for
native AER control
在 2025/10/24 11:14, Lukas Wunner 写道:
> On Fri, Oct 24, 2025 at 11:09:25AM +0800, Shuai Xue wrote:
>> 2025/10/23 18:29, Lukas Wunner:
>>> On Mon, Oct 20, 2025 at 10:45:31PM +0800, Shuai Xue wrote:
>>>> From PCIe spec, BIT 0-2 are logged for functions supporting Advanced
>>>> Error Handling.
>>>>
>>>> I am not sure if we should clear BIT 3, and also BIT 6 (Emergency Powerjj
>>>> Reduction Detected) and in case a AER error.
>>>
>>> AFAIUI, bits 0 to 3 are what the PCIe r7.0 sec 6.2.1 calls
>>> "baseline capability" error reporting. They're supported
>>> even if AER is not supported.
>>>
>>> Bit 6 has nothing to do with this AFAICS.
>>
>> Per PCIe r7.0 section 7.5.3.5:
>>
>> **For Functions supporting Advanced Error Handling**, errors are logged
>> in this register regardless of the settings of the Uncorrectable Error
>> Mask register. Default value of this bit is 0b.
>>
>> From this, it's clear that bits 0 to 2 are not logged unless AER is supported.
>
> No. It just means that if AER is supported, the Uncorrectable Error Mask
> register has no bearing on whether the bits in the Device Status register
> are set. It does not mean that the bits are only set if AER is supported.
>
Thank you for pointing that out. I now understand that my interpretation
was incorrect.
As such, I will drop this patch that introduced the dev->aer_cap check.
The remaining question is whether it would make more sense to rename
pcie_clear_device_status() to pci_clear_device_error_status() and refine
its behavior by adding a mask specifically for bits 0 to 3. Here’s an
example of the proposed change:
-void pcie_clear_device_status(struct pci_dev *dev)
+void pci_clear_device_error_status(struct pci_dev *dev)
{
u16 sta;
pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
- pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+ /* clear error-related bits: 0-3 */
+ pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta & 0xF);
}
Renaming the function to pci_clear_device_error_status() better
reflects its current focus on clearing error-related bits, and
introducing the mask ensures that only those relevant bits (0-3) are
cleared, rather than modifying the entire register. What do you think
about these changes?
Thanks.
Shuai
Powered by blists - more mailing lists