lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <85e2a07d-6b7b-d443-fb9a-c15b8e9849d2@v0yd.nl>
Date:   Wed, 13 Oct 2021 23:35:28 +0200
From:   Jonas Dreßler <verdre@...d.nl>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Amitkumar Karwar <amitkarwar@...il.com>,
        Ganapathi Bhat <ganapathi017@...il.com>,
        Xinming Hu <huxinming820@...il.com>,
        Kalle Valo <kvalo@...eaurora.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Tsuchiya Yuto <kitakar@...il.com>,
        linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
        Maximilian Luz <luzmaximilian@...il.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Pali Rohár <pali@...nel.org>,
        Heiner Kallweit <hkallweit1@...il.com>,
        Johannes Berg <johannes@...solutions.net>,
        Brian Norris <briannorris@...omium.org>,
        David Laight <David.Laight@...LAB.COM>,
        Alex Williamson <alex.williamson@...hat.com>
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS Surface
 devices

On 10/12/21 13:29, Bjorn Helgaas wrote:
> On Tue, Oct 12, 2021 at 10:48:49AM +0200, Jonas Dreßler wrote:
>> On 10/11/21 18:53, Bjorn Helgaas wrote:
>>> On Mon, Oct 11, 2021 at 03:42:38PM +0200, Jonas Dreßler wrote:
>>>> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
>>>> reports a hardcoded LTR value to the system during initialization,
>>>> probably as an (unsuccessful) attempt of the developers to fix firmware
>>>> crashes. This LTR value prevents most of the Microsoft Surface devices
>>>> from entering deep powersaving states (either platform C-State 10 or
>>>> S0ix state), because the exit latency of that state would be higher than
>>>> what the card can tolerate.
>>>
>>> S0ix and C-State 10 are ACPI concepts that don't mean anything in a
>>> PCIe context.
>>>
>>> I think LTR is only involved in deciding whether to enter the ASPM
>>> L1.2 substate.  Maybe the system will only enter C-State 10 or S0ix
>>> when the link is in L1.2?
>>
>> Yup, this is indeed the case, see https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting
>> (ctrl+f "IP LINK PM STATE").
> 
> I think it would be helpful if the commit log included this missing
> link, e.g., the LTR value prevents the link from going to L1.2, which
> in turn prevents use of C-State 10/S0ix.
> 
>> There's two alternatives I can think of to deal with this issue:
>>
>> 1) Revert the cards firmware in linux-firmware back to the second-latest
>> version. That firmware didn't report a fixed LTR value and also doesn't
>> have any other obvious issues I know of compared to the latest one.
> 
> You've mentioned "fixed LTR value" more than once.  My weak
> understanding of LTR and L1.2 is that the latencies a device reports
> via LTR messages are essentially a function of buffering in the device
> and electrical characteristics of the link.  I expect them to be set
> once and not changed.

I'm not an expert on PCI at all, but from my understanding the idea behind
the LTR mechanism is to be able to dynamically communicate latency
requirements depending on the current situation/workload. So for example
in case the wifi card is receiving a lot of data, it would report a lower
latency tolerance because its rx buffers are filling up fast.

Looking at ltr_show in the pmc_core debug driver, that also seems to be how
other devices handle it: For example moving the USB mouse makes the XHCI
controller report a non-null LTR for a few seconds.

So with "fixed LTR value" I meant to say that in my understanding this is
exactly the opposite of how LTR is supposed to be used: Instead of
dynamically reporting a new latency tolerance when the card is being used,
it reports a "fixed" tolerance once during firmware startup, maybe with
the intention of papering over bugs in the firmware...

> 
> But did the previous firmware report different latencies at different
> times?  Or did it just not advertise L1.2 support at all?  Or do you
> mean the new firmware reports a "corrected" LTR value that doesn't
> work as well?

Sorry, as mentioned in my other reply the previous firmware is actually
behaving in the exact same way, no clue why I remembered this wrong.

> 
>> 2) Somehow interact with the PMC Core driver to make it ignore the LTR
>> values reported by the card (I doubt that's possible from mwifiex).
>> It can be done manually via debugfs by writing to
>> /sys/kernel/debug/pmc_core/ltr_ignore.
> 
> Interesting; I wasn't aware of that, thanks.  This still feels like a
> configuration issue.  If we ignore the reported LTR values, I guess
> you mean the root port assumes it's *always* safe to enter L1.2, i.e.,
> the device has enough buffering to deal with the exit latency?

Not sure about that, in theory there's also the whole negotiation via the
CLKREQ# pin when entering ASPM L1.2. The card could use that to reject L1.2
entry I guess.

> 
> I would think there would be a way to program the LTR capability to
> have the device itself report that, so we wouldn't have to fiddle with
> the upstream end.

Well I mean the device does report LTR capabilities and a maximum snoop and
non-snoop latency via extended capabilities, so I guess that means it is
supported.

> 
>>>> +	 * We need to do it here because it must happen after firmware
>>>> +	 * initialization and this function is called right after that is done.
>>>> +	 */
> 
>>>> +	if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE)
>>>> +		pci_reset_function(parent_pdev);
>>>
>>> PCIe r5.0, sec 7.5.3.3, says Function Level Reset can only be
>>> supported by endpoints, so I guess this will actually do some other
>>> kind of reset.
>>
>> Interesting, I briefly searched and it doesn't seem like think
>> there's public documentation available by Intel that goes into
>> the specifics here, maybe someone working at Intel knows more?
> 
> "lspci -vv" will tell you whether the root port advertises FLR
> support.  The spec says it shouldn't, but I think pci_reset_function()
> relies on what DevCap says.  You could instrument pci_reset_function()
> to see exactly what kind of reset we do.

Ahh indeed, I wasn't aware there's multiple kinds of resets. Looks like
what it uses is pci_pm_reset().

> 
> Bjorn
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ