lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f27c14467aa728358ebfe1686517aabe7c1e878.camel@kernel.org>
Date: Sat, 07 Dec 2024 00:06:49 +0100
From: Niklas Schnelle <niks@...nel.org>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>, 
	linux-pci@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>, Lorenzo
 Pieralisi <lorenzo.pieralisi@....com>, Rob Herring <robh@...nel.org>,
 Krzysztof Wilczyński	 <kw@...ux.com>, "Maciej W . Rozycki"
 <macro@...am.me.uk>, Jonathan Cameron	 <Jonathan.Cameron@...wei.com>, Lukas
 Wunner <lukas@...ner.de>, Alexandru Gagniuc <mr.nuke.me@...il.com>, Krishna
 chaitanya chundru <quic_krichai@...cinc.com>, Srinivas Pandruvada
 <srinivas.pandruvada@...ux.intel.com>, "Rafael J . Wysocki"
 <rafael@...nel.org>, 	linux-pm@...r.kernel.org, Smita Koralahalli	
 <Smita.KoralahalliChannabasappa@....com>, linux-kernel@...r.kernel.org
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>, Amit Kucheria
 <amitk@...nel.org>,  Zhang Rui <rui.zhang@...el.com>, Christophe JAILLET
 <christophe.jaillet@...adoo.fr>, niks@...nel.org
Subject: Re: [PATCH v9 6/9] PCI/bwctrl: Re-add BW notification portdrv as
 PCIe BW controller

On Fri, 2024-12-06 at 21:07 +0100, Niklas Schnelle wrote:
> On Fri, 2024-12-06 at 20:31 +0100, Niklas Schnelle wrote:
> > On Fri, 2024-12-06 at 19:12 +0100, Niklas Schnelle wrote:
> > > On Fri, 2024-10-18 at 17:47 +0300, Ilpo Järvinen wrote:
> > > > This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove
> > > > bandwidth notification"). An upcoming commit extends this driver
> > > > building PCIe bandwidth controller on top of it.
> > > > 
> > > > The PCIe bandwidth notification were first added in the commit
> > > > e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> > > > notification") but later had to be removed. The significant changes
> > > > compared with the old bandwidth notification driver include:
> > > > 
> > ---8<---
> > > > ---
> > > 
> > > Hi Ilpo,
> > > 
> > > I bisected a v6.13-rc1 boot hang on my personal workstation to this
> > > patch. Sadly I don't have much details like a panic or so because the
> > > boot hangs before any kernel messages, or at least they're not visible
> > > long enough to see. I haven't yet looked into the code as I wanted to
> > > raise awareness first. Since the commit doesn't revert cleanly on
> > > v6.13-rc1 I also haven't tried that yet.
> > > 
> > > Here are some details on my system:
> > > - AMD Ryzen 9 3900X 
> > > - ASRock X570 Creator Motherboard
> > > - Radeon RX 5600 XT
> > > - Intel JHL7540 Thunderbolt 3 USB Controller (only USB 2 plugged)
> > > - Intel 82599 10 Gigabit NIC with SR-IOV enabled with 2 VFs
> > > - Intel n I211 Gigabit NIC
> > > - Intel Wi-Fi 6 AX200
> > > - Aquantia AQtion AQC107 NIC
> > > 
> > > If you have patches or things to try just ask.
> > > 
> > > Thanks,
> > > Niklas
> > > 
> > 
> > Ok I can now at least confirm that bluntly disabling the new bwctrl
> > driver with the below diff on top of v6.13-rc1 circumvents the boot
> > hang I'm seeing. So it's definitely this.
> > 
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index 5e10306b6308..6fa54480444a 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -828,7 +828,7 @@ static void __init pcie_init_services(void)
> >         pcie_aer_init();
> >         pcie_pme_init();
> >         pcie_dpc_init();
> > -       pcie_bwctrl_init();
> > +       /* pcie_bwctrl_init(); */
> >         pcie_hp_init();
> >  }
> > 
> 
> Also here is the full lspci -vvv output running the above on v6.13-rc1:
> https://paste.js.org/9UwQIMp7eSgp
> 
> Also note that I have CONFIG_PCIE_THERMAL unset so it's also not the
> cooling device portion that's causing the issue. Next I guess I should
> narrow it down to the specific port where enabling the bandwidth
> monitoring is causing trouble, not yet sure how best to do this with
> this many devices.
> 
> Thanks,
> Niklas

Ok did some fiddeling and it's the thunderbolt ports. The below diff
works around the issue. That said I guess for a proper fix this would
should get filtered by the port service matching? Also as can be seen
in lspci the port still claims to support bandwidth management so maybe
other thunderbolt ports actually do.

Thanks,
Niklas

diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
index b59cacc740fa..76a14f959c7f 100644
--- a/drivers/pci/pcie/bwctrl.c
+++ b/drivers/pci/pcie/bwctrl.c
@@ -294,6 +294,9 @@ static int pcie_bwnotif_probe(struct pcie_device *srv)
        struct pci_dev *port = srv->port;
        int ret;

+       if (srv->port->is_thunderbolt)
+               return -EOPNOTSUPP;
+
        struct pcie_bwctrl_data *data = devm_kzalloc(&srv->device,
                                                     sizeof(*data), GFP_KERNEL);
        if (!data)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ