lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241211171403.00002fa3@huawei.com>
Date: Wed, 11 Dec 2024 17:14:03 +0000
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Lukas Wunner <lukas@...ner.de>
CC: Niklas Schnelle <niks@...nel.org>, Ilpo Järvinen
	<ilpo.jarvinen@...ux.intel.com>, Bjorn Helgaas <bhelgaas@...gle.com>,
	<linux-pci@...r.kernel.org>, Rob Herring <robh@...nel.org>, "Krzysztof
 Wilczy??ski" <kw@...ux.com>, "Maciej W . Rozycki" <macro@...am.me.uk>,
	Alexandru Gagniuc <mr.nuke.me@...il.com>, Krishna chaitanya chundru
	<quic_krichai@...cinc.com>, Srinivas Pandruvada
	<srinivas.pandruvada@...ux.intel.com>, "Rafael J . Wysocki"
	<rafael@...nel.org>, <linux-pm@...r.kernel.org>, Smita Koralahalli
	<Smita.KoralahalliChannabasappa@....com>, <linux-kernel@...r.kernel.org>,
	Daniel Lezcano <daniel.lezcano@...aro.org>, Amit Kucheria <amitk@...nel.org>,
	Zhang Rui <rui.zhang@...el.com>, Christophe JAILLET
	<christophe.jaillet@...adoo.fr>, Mika Westerberg
	<mika.westerberg@...ux.intel.com>, Lorenzo Pieralisi <lpieralisi@...nel.org>
Subject: Re: [PATCH] PCI/portdrv: Disable bwctrl service if port is fixed at
 2.5 GT/s

On Wed, 11 Dec 2024 08:57:23 +0100
Lukas Wunner <lukas@...ner.de> wrote:

> On Tue, Dec 10, 2024 at 09:45:18PM +0100, Niklas Schnelle wrote:
> > On Tue, 2024-12-10 at 11:05 +0100, Lukas Wunner wrote:  
> > > First of all, the Supported Link Speeds field in the Link Capabilities
> > > register (which you're querying here) was renamed to Max Link Speed in
> > > PCIe r3.1 and a new Link Capabilities 2 register was added which contains
> > > a new Supported Link Speeds field.  Software is supposed to query the
> > > latter if the device implements the Link Capabilities 2 register
> > > (see the other Implementation Note at the end of PCIe r6.2 sec 7.5.3.18).  
> > 
> > Would it maybe make sense to update the comment for PCI_EXP_LNKCAP_SLS
> > in pci_regs.h to point out that in PCIe r3.1 and newer this is called
> > the Max Link Speed field? This would certainly helped me here.  
> 
> The macros for the individual speeds (e.g. PCI_EXP_LNKCAP_SLS_2_5GB)
> already have code comments which describe their new meaning.
> 
> I guess the reason why the code comment for PCI_EXP_LNKCAP_SLS wasn't
> updated is that it seeks to document the meaning of the "SLS" acronym
> (Supported Link Speeds).
> 
> But yes, amending that with something like...
> 
> /* Max Link Speed (Supported Link Speeds before PCIe r3.1) */
> 
> ...probably make sense, so feel free to propose that in a separate patch.
> 
> 
> > > So to make this future-proof what you could do is check whether only a
> > > *single* speed is supported (which could be something else than 2.5 GT/s
> > > if future spec versions allow that), i.e.:
> > > 
> > > -		if (linkcap & PCI_EXP_LNKCAP_LBNC)
> > > +		if (linkcap & PCI_EXP_LNKCAP_LBNC &&
> > > +		    hweight8(dev->supported_speeds) > 1)  
> > 
> > This also makes sense to me in that the argument holds that if there is
> > only one supported speed bwctrl can't control it. That said it is
> > definitely more general than this patch.
> > 
> > Sadly, I tried it and in my case it doesn't work. Taking a closer look
> > at lspci -vvv of the Thunderbolt port as well as a debug print reveals
> > why:
> > 
> > 07:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06) (prog-if 00 [Normal decode])
> >        ...
> >                 LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L1 <1us
> >                         ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> >                 LnkCtl: ASPM Disabled; LnkDisable- CommClk+
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed 2.5GT/s, Width x4
> >                         TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> > 	...
> >                 LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
> >                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
> >                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> >                          Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
> > 	...
> > 
> > So it seems that on this Thunderbolt chip the LnkCap field
> > says 2.5 GT/s only as per the USB 4 spec you quoted but LnkCap2
> > is 0x0E i.e. 2.5-8 GT/s.
> > 
> > I wonder if this is related to why the hang occurs. Could it be that
> > bwctrl tries to enable speeds above 2.5 GT/s and that causes links to
> > fail?  
> 
> Ilpo knows this code better than I do but yes, that's plausible.
> The bandwidth controller does't change the speed by itself,
> it only monitors speed changes.  But it does provide a
> pcie_set_target_speed() API which is called by the thermal driver
> as well as the pcie_failed_link_retrain() quirk.  I suspect the
> latter is the culprit here.  If that suspicion is correct,
> you should be seeing messages such as...
> 
> "removing 2.5GT/s downstream link speed restriction"
> 
> ...in dmesg but I think you wrote that you're not getting any
> messages at all, right?  Perhaps if you add "early_printk=efi"
> to the kernel command line you may see what's going on.
> 
> One idea in this case would be to modify pcie_get_supported_speeds()
> such that it filters out any speeds in the Link Capabilities 2 register
> which exceed the Max Link Speed in the Link Capabilties register.
> However the spec says that software should look at the Link Capabilities 2
> register to determine supported speeds if that register is present.
> So I think we may not conform to the spec then.
> 
> The better option is thus probably to add a DECLARE_PCI_FIXUP_EARLY()
> quirk for Titan Ridge which sets the supported_speeds to just 2.5 GT/s.
> *If* you want to go with the future-proof option which checks that
> just one speed is supported.

I'd definitely support going with the future proof solution here if
we can.

> 
> Titan Ridge is an old chip.  I'm not sure if newer discrete Thunderbolt
> controllers exhibit the same issue but likely not.
> 
> Thanks,
> 
> Lukas
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ