[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dc6e677f-4c19-dd25-8878-8eae9154cff4@linux.intel.com>
Date: Wed, 11 Dec 2024 15:07:38 +0200 (EET)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Lukas Wunner <lukas@...ner.de>
cc: Niklas Schnelle <niks@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
linux-pci@...r.kernel.org, Rob Herring <robh@...nel.org>,
Krzysztof Wilczy??ski <kw@...ux.com>,
"Maciej W . Rozycki" <macro@...am.me.uk>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Alexandru Gagniuc <mr.nuke.me@...il.com>,
Krishna chaitanya chundru <quic_krichai@...cinc.com>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
"Rafael J . Wysocki" <rafael@...nel.org>, linux-pm@...r.kernel.org,
Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>,
LKML <linux-kernel@...r.kernel.org>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Amit Kucheria <amitk@...nel.org>, Zhang Rui <rui.zhang@...el.com>,
Christophe JAILLET <christophe.jaillet@...adoo.fr>,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
Mika Westerberg <mika.westerberg@...ux.intel.com>
Subject: Re: [PATCH] PCI/portdrv: Disable bwctrl service if port is fixed at
2.5 GT/s
On Wed, 11 Dec 2024, Lukas Wunner wrote:
> On Tue, Dec 10, 2024 at 09:45:18PM +0100, Niklas Schnelle wrote:
> > On Tue, 2024-12-10 at 11:05 +0100, Lukas Wunner wrote:
> > > First of all, the Supported Link Speeds field in the Link Capabilities
> > > register (which you're querying here) was renamed to Max Link Speed in
> > > PCIe r3.1 and a new Link Capabilities 2 register was added which contains
> > > a new Supported Link Speeds field. Software is supposed to query the
> > > latter if the device implements the Link Capabilities 2 register
> > > (see the other Implementation Note at the end of PCIe r6.2 sec 7.5.3.18).
> >
> > Would it maybe make sense to update the comment for PCI_EXP_LNKCAP_SLS
> > in pci_regs.h to point out that in PCIe r3.1 and newer this is called
> > the Max Link Speed field? This would certainly helped me here.
>
> The macros for the individual speeds (e.g. PCI_EXP_LNKCAP_SLS_2_5GB)
> already have code comments which describe their new meaning.
>
> I guess the reason why the code comment for PCI_EXP_LNKCAP_SLS wasn't
> updated is that it seeks to document the meaning of the "SLS" acronym
> (Supported Link Speeds).
>
> But yes, amending that with something like...
>
> /* Max Link Speed (Supported Link Speeds before PCIe r3.1) */
>
> ...probably make sense, so feel free to propose that in a separate patch.
>
> > > So to make this future-proof what you could do is check whether only a
> > > *single* speed is supported (which could be something else than 2.5 GT/s
> > > if future spec versions allow that), i.e.:
> > >
> > > - if (linkcap & PCI_EXP_LNKCAP_LBNC)
> > > + if (linkcap & PCI_EXP_LNKCAP_LBNC &&
> > > + hweight8(dev->supported_speeds) > 1)
> >
> > This also makes sense to me in that the argument holds that if there is
> > only one supported speed bwctrl can't control it. That said it is
> > definitely more general than this patch.
> >
> > Sadly, I tried it and in my case it doesn't work. Taking a closer look
> > at lspci -vvv of the Thunderbolt port as well as a debug print reveals
> > why:
> >
> > 07:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) (prog-if 00 [Normal decode])
> > ...
> > LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L1 <1us
> > ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > LnkCtl: ASPM Disabled; LnkDisable- CommClk+
> > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > LnkSta: Speed 2.5GT/s, Width x4
> > TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> > ...
> > LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
> > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
> > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
> > ...
> >
> > So it seems that on this Thunderbolt chip the LnkCap field
> > says 2.5 GT/s only as per the USB 4 spec you quoted but LnkCap2
> > is 0x0E i.e. 2.5-8 GT/s.
> >
> > I wonder if this is related to why the hang occurs. Could it be that
> > bwctrl tries to enable speeds above 2.5 GT/s and that causes links to
> > fail?
>
> Ilpo knows this code better than I do but yes, that's plausible.
> The bandwidth controller does't change the speed by itself,
> it only monitors speed changes. But it does provide a
> pcie_set_target_speed() API which is called by the thermal driver
> as well as the pcie_failed_link_retrain() quirk. I suspect the
> latter is the culprit here. If that suspicion is correct,
> you should be seeing messages such as...
>
> "removing 2.5GT/s downstream link speed restriction"
That block is conditioned by pci_match_id() so I don't think it would
execute.
The block that prints "retraining failed" is the one which attempts to
restore the old Target Link Speed. TLS seems to also be set to 8GT/s as
per the lspci from Niklas which is another inconsistency in the config
space.
Although, I'd tend to think if these trigger the quirk/retraining now,
kernel has done some retraining for these links prior to bwctrl was added,
so perhaps that 8GT/s TLS is not going to be found as the culprit.
What could be tried though is to not do the LBMIE enabled at all which is
one clearly new thing which comes with bwctrl. Commenting out the calls to
pcie_bwnotif_enable() should achieve that.
> ...in dmesg but I think you wrote that you're not getting any
> messages at all, right? Perhaps if you add "early_printk=efi"
> to the kernel command line you may see what's going on.
>
> One idea in this case would be to modify pcie_get_supported_speeds()
> such that it filters out any speeds in the Link Capabilities 2 register
> which exceed the Max Link Speed in the Link Capabilties register.
> However the spec says that software should look at the Link Capabilities 2
> register to determine supported speeds if that register is present.
> So I think we may not conform to the spec then.
>
> The better option is thus probably to add a DECLARE_PCI_FIXUP_EARLY()
> quirk for Titan Ridge which sets the supported_speeds to just 2.5 GT/s.
> *If* you want to go with the future-proof option which checks that
> just one speed is supported.
>
> Titan Ridge is an old chip. I'm not sure if newer discrete Thunderbolt
> controllers exhibit the same issue but likely not.
I recall taking note of this inconsistency in some lspci dumps I've from
Mika (but forgot it until now). So I'm afraid it might be more widespread
than just TR.
--
i.
Powered by blists - more mailing lists