[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211012153921.GA1754629@bhelgaas>
Date: Tue, 12 Oct 2021 10:39:21 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Jonas Dreßler <verdre@...d.nl>
Cc: Pali Rohár <pali@...nel.org>,
Amitkumar Karwar <amitkarwar@...il.com>,
Ganapathi Bhat <ganapathi017@...il.com>,
Xinming Hu <huxinming820@...il.com>,
Kalle Valo <kvalo@...eaurora.org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Tsuchiya Yuto <kitakar@...il.com>,
linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
Maximilian Luz <luzmaximilian@...il.com>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Heiner Kallweit <hkallweit1@...il.com>,
Johannes Berg <johannes@...solutions.net>,
Brian Norris <briannorris@...omium.org>,
David Laight <David.Laight@...LAB.COM>,
Vidya Sagar <vidyas@...dia.com>,
Victor Ding <victording@...gle.com>
Subject: Re: [PATCH] mwifiex: Add quirk resetting the PCI bridge on MS
Surface devices
[+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread:
https://lore.kernel.org/all/20211011134238.16551-1-verdre@v0yd.nl/]
On Tue, Oct 12, 2021 at 10:55:03AM +0200, Jonas Dreßler wrote:
> On 10/11/21 19:02, Pali Rohár wrote:
> > On Monday 11 October 2021 15:42:38 Jonas Dreßler wrote:
> > > The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card
> > > reports a hardcoded LTR value to the system during initialization,
> > > probably as an (unsuccessful) attempt of the developers to fix firmware
> > > crashes. This LTR value prevents most of the Microsoft Surface devices
> > > from entering deep powersaving states (either platform C-State 10 or
> > > S0ix state), because the exit latency of that state would be higher than
> > > what the card can tolerate.
> >
> > This description looks like a generic issue in 88W8897 chip or its
> > firmware and not something to Surface PCIe controller or Surface HW. But
> > please correct me if I'm wrong here.
> >
> > Has somebody 88W8897-based PCIe card in non-Surface device and can check
> > or verify if this issue happens also outside of the Surface device?
> >
> > It would be really nice to know if this is issue in Surface or in 8897.
>
> Fairly sure the LTR value is something that's reported by the firmware
> and will be the same on all 8897 devices (as mentioned in my reply to Bjorn
> the second-latest firmware doesn't report that fixed LTR value).
I suggested earlier that the LTR values reported by the device might
depend on the electrical characteristics of the link and hence be
platform-dependent, but I think that might be wrong.
The spec (PCIe r5.0, sec 5.5.4) does say that some of the *other*
parameters related to L1.2 entry are platform-dependent:
Prior to setting either or both of the enable bits for L1.2, the
values for TPOWER_ON, Common_Mode_Restore_Time, and, if the ASPM
L1.2 Enable bit is to be Set, the LTR_L1.2_THRESHOLD (both Value
and Scale fields) must be programmed. The TPOWER_ON and
Common_Mode_Restore_Time fields must be programmed to the
appropriate values based on the components and AC coupling
capacitors used in the connection linking the two components. The
determination of these values is design implementation specific.
These T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD
values are in the L1 PM Substates Control registers.
I don't know of a way for the kernel or the device firmware to learn
these circuit characteristics or the appropriate values, so I think
only system firmware can program the L1 PM Substates Control registers
(a corollary of this is that I don't see a way for hot-plugged devices
to *ever* use L1.2).
I wonder if this reset quirk works because pci_reset_function() saves
and restores much of config space, but it currently does *not* restore
the L1 PM Substates capability, so those T_POWER_ON,
Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get
cleared out by the reset. We did briefly save/restore it [1], but we
had to revert that because of a regression that AFAIK was never
resolved [2]. I expect we will eventually save/restore this, so if
the quirk depends on it *not* being restored, that would be a problem.
You should be able to test whether this is the critical thing by
clearing those registers with setpci instead of doing the reset. Per
spec, they can only be modified when L1.2 is disabled, so you would
have to disable it via sysfs (for the endpoint, I think)
/sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root
port, then re-enable L1.2.
[1] https://git.kernel.org/linus/4257f7e008ea
[2] https://lore.kernel.org/all/20210127160449.2990506-1-helgaas@kernel.org/
Powered by blists - more mailing lists