lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z1R4VNwCOlh9Sg9n@wunner.de>
Date: Sat, 7 Dec 2024 17:31:16 +0100
From: Lukas Wunner <lukas@...ner.de>
To: Niklas Schnelle <niks@...nel.org>
Cc: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
	linux-pci@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>,
	Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
	Rob Herring <robh@...nel.org>, Krzysztof Wilczy??ski <kw@...ux.com>,
	"Maciej W . Rozycki" <macro@...am.me.uk>,
	Jonathan Cameron <Jonathan.Cameron@...wei.com>,
	Alexandru Gagniuc <mr.nuke.me@...il.com>,
	Krishna chaitanya chundru <quic_krichai@...cinc.com>,
	Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
	"Rafael J . Wysocki" <rafael@...nel.org>, linux-pm@...r.kernel.org,
	Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>,
	linux-kernel@...r.kernel.org,
	Daniel Lezcano <daniel.lezcano@...aro.org>,
	Amit Kucheria <amitk@...nel.org>, Zhang Rui <rui.zhang@...el.com>,
	Christophe JAILLET <christophe.jaillet@...adoo.fr>,
	Mika Westerberg <mika.westerberg@...ux.intel.com>
Subject: Re: [PATCH v9 6/9] PCI/bwctrl: Re-add BW notification portdrv as
 PCIe BW controller

[cc += Mika, start of thread:
https://lore.kernel.org/all/db8e457fcd155436449b035e8791a8241b0df400.camel@kernel.org/
]

On Sat, Dec 07, 2024 at 12:06:49AM +0100, Niklas Schnelle wrote:
> > > On Fri, 2024-12-06 at 19:12 +0100, Niklas Schnelle wrote:
> > > > I bisected a v6.13-rc1 boot hang on my personal workstation to this
> > > > patch. Sadly I don't have much details like a panic or so because the
> > > > boot hangs before any kernel messages, or at least they're not visible
> > > > long enough to see. I haven't yet looked into the code as I wanted to
> > > > raise awareness first. Since the commit doesn't revert cleanly on
> > > > v6.13-rc1 I also haven't tried that yet.
> > > > 
> > > > Here are some details on my system:
> > > > - AMD Ryzen 9 3900X 
> > > > - ASRock X570 Creator Motherboard
> > > > - Radeon RX 5600 XT
> > > > - Intel JHL7540 Thunderbolt 3 USB Controller (only USB 2 plugged)
> > > > - Intel 82599 10 Gigabit NIC with SR-IOV enabled with 2 VFs
> > > > - Intel n I211 Gigabit NIC
> > > > - Intel Wi-Fi 6 AX200
> > > > - Aquantia AQtion AQC107 NIC
> 
> Ok did some fiddeling and it's the thunderbolt ports. The below diff
> works around the issue. That said I guess for a proper fix this would
> should get filtered by the port service matching? Also as can be seen
> in lspci the port still claims to support bandwidth management so maybe
> other thunderbolt ports actually do.
[...]
> --- a/drivers/pci/pcie/bwctrl.c
> +++ b/drivers/pci/pcie/bwctrl.c
> @@ -294,6 +294,9 @@ static int pcie_bwnotif_probe(struct pcie_device *srv)
>         struct pci_dev *port = srv->port;
>         int ret;
> 
> +       if (srv->port->is_thunderbolt)
> +               return -EOPNOTSUPP;
> +

Thanks for reporting and analyzing this.

The PCIe bandwidth controller is only instantiated on Downstream Ports.
Per the spec, Thunderbolt PCIe Downstream Ports are just tunnel endpoints
with a fake Max Link Speed of 2.5 GT/s (USB4 v2 sec 11.2.1):

   "Max Link Speed field in the Link Capabilities Register set to 0001b
    (data rate of 2.5 GT/s only).
    Note: These settings do not represent actual throughput.
    Throughput is implementation specific and based on the USB4 Fabric
    performance."

So your patch does make sense in so far as the link speed of Thunderbolt
PCIe Downstream Ports is fixed to 2.5 GT/s and cannot be throttled because
that's already the lowest possible PCIe speed.  The actual speed is
determined by the Thunderbolt links.

The check for the is_thunderbolt bit should be moved to the if-clause in
get_port_device_capability() which sets the PCIE_PORT_SERVICE_BWCTRL bit
in the services mask.

Alternatively, it may be worth considering not to instantiate the
bandwidth controller if the only link speed supported is 2.5 GT/s.

We should try to find out what actually causes the boot hang
(some interrupt storm maybe?), but that can hopefully be done
internally at Intel if the boot hang is reproducible.

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ