lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 8 Nov 2023 12:47:08 +0530
From:   Vidya Sagar <vidyas@...dia.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Lorenzo Pieralisi <lpieralisi@...nel.org>,
        Vikram Sethi <vsethi@...dia.com>,
        Thierry Reding <treding@...dia.com>,
        Jonathan Hunter <jonathanh@...dia.com>,
        Krishna Thota <kthota@...dia.com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Question: Clearing error bits in the root port post enumeration



On 11/7/2023 8:59 PM, Bjorn Helgaas wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Nov 07, 2023 at 08:44:53AM +0530, Vidya Sagar wrote:
>> On 11/3/2023 11:50 PM, Bjorn Helgaas wrote:
>>> On Tue, Oct 31, 2023 at 12:26:31PM +0000, Vidya Sagar wrote:
>>>> Hi folks,
>>>>
>>>> I would like to know your comments on the following scenario where
>>>> we are observing the root port logging errors because of the
>>>> enumeration flow being followed.
>>>>
>>>> DUT information:
>>>> - Has a root port and an endpoint connected to it
>>>> - Uses ECAM mechanism to access the configuration space
>>>> - Booted through ACPI flow
>>>> - Has a Firmware-First approach for handling the errors
>>>> - System is configured to treat Unsupported Requests as
>>>>     AdvisoryNon-Fatal errors
>>>>
>>>> As we all know, when a configuration read request comes in for a
>>>> device number that is not implemented, a UR would be returned as per
>>>> the PCIe spec.
>>>>
>>>> As part of the enumeration flow on DUT, when the kernel reads offset
>>>> 0x0 of B:D:F=0:0:0, the root port responds with its valid Vendor-ID
>>>> and Device-ID values.  But, when B:D:F=0:1:0 is probed, since there
>>>> is no device present there, the root port responds with an
>>>> Unsupported Request and simultaneously logs the same in the Device
>>>> Status register (i.e. bit-3).  Because of it, there is a UR logged
>>>> in the Device Status register of the RP by the time enumeration is
>>>> complete.
>>>>
>>>> In the case of AER capability natively owned by the kernel, the AER
>>>> driver's init call would clear all such pending bits.
>>>>
>>>> Since we are going with the Firmware-First approach, and the system
>>>> is configured to treat Unsupported Requests as AdvisoryNon-Fatal
>>>> errors, only a correctable error interrupt can be raised to the
>>>> Firmware which takes care of clearing the corresponding status
>>>> registers.  The firmware can't know about the UnsupReq bit being set
>>>> as the interrupt it received is for a correctable error hence it
>>>> clears only bits related to correctable error.
>>>>
>>>> All these events leave a freshly booted system with the following
>>>> bits set.
>>>>
>>>> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-          (MAbort)
>>>> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-                                                              (UnsupReq)
>>>> UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-   (UnsupReq)
>>>>
>>>> Since the reason for UR is well understood at this point, I would
>>>> like to weigh in on the idea of clearing the aforementioned bits in
>>>> the root port once the enumeration is done particularly to cater to
>>>> the configurations where Firmware-First approach is in place.
>>>> Please let me know your comments on this approach.
>>>
>>> I think Secondary status (PCI_SEC_STATUS) is always owned by the OS
>>> and is not affected by _OSC negotiation, right?  Linux does basically
>>> nothing with that today, but I think it *could* clear the "Received
>>> Master Abort" bit.
>>
>> Yes. PCI_SEC_STATUS is always owned by the OS and _OSC negotiation doesn't
>> really affect that.
>>
>>> I'm not very familiar with Advisory Non-Fatal errors.  I'm curious
>>> about the UESta situation: why can't firmware know about UnsupReq
>>> being set?  I assume PCI_ERR_COR_ADV_NFAT is the Correctable Error
>>> Status bit the firmware *does* see and clear.
>>
>> Yes, PCI_ERR_COR_ADV_NFAT is indeed cleared by the firmware.
>>>
>>> But isn't the whole point of Advisory Non-Fatal errors that an error
>>> that is logged as an Uncorrectable Error and that normally would be
>>> signaled with ERR_NONFATAL is signaled with ERR_COR instead?  So
>>> doesn't PCI_ERR_COR_ADV_NFAT being set imply that some
>>> PCI_ERR_UNCOR_STATUS must be set as well?  If so, I would think
>>> firmware *could* figure that out and clear the PCI_ERR_UNCOR_STATUS
>>> bit.
>>
>> So, are you suggesting that let the firmware only clear the
>> PCI_ERR_UNCOR_STATUS also?
> 
> In this firmware-first scenario, I'm assuming the platform retained
> ownership of the AER capability, so I would think firmware certainly
> should be allowed to clear PCI_ERR_UNCOR_STATUS.
> 
>> if so, then, I can even make the firmware clear the PCI_SEC_STATUS
>> also thereby leaving the firmware responsible for clearing all the
>> error bits. Does that sound ok?
> 
> It doesn't sound quite right to me for firmware to clear
> PCI_SEC_STATUS because it doesn't own that register.  I suspect we
> would probably see the "Received Master Abort" bit set after
> enumeration even on Conventional PCI systems, so I doubt this is
> anything specific to PCIe or AER, and maybe Linux should clear it
> after enumerating devices below the bridge.
Agree.
I'll push a patch to clear PCI_SEC_STATUS bit.

Thanks for your inputs Bjorn.

> 
> Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ