[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7a4a9d51a9105bd5ca2c850c26fed6435b5e90e9.camel@kernel.org>
Date: Fri, 06 Dec 2024 21:07:20 +0100
From: Niklas Schnelle <niks@...nel.org>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
linux-pci@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>, Lorenzo
Pieralisi <lorenzo.pieralisi@....com>, Rob Herring <robh@...nel.org>,
Krzysztof Wilczyński <kw@...ux.com>, "Maciej W . Rozycki"
<macro@...am.me.uk>, Jonathan Cameron <Jonathan.Cameron@...wei.com>, Lukas
Wunner <lukas@...ner.de>, Alexandru Gagniuc <mr.nuke.me@...il.com>, Krishna
chaitanya chundru <quic_krichai@...cinc.com>, Srinivas Pandruvada
<srinivas.pandruvada@...ux.intel.com>, "Rafael J . Wysocki"
<rafael@...nel.org>, linux-pm@...r.kernel.org, Smita Koralahalli
<Smita.KoralahalliChannabasappa@....com>, linux-kernel@...r.kernel.org
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>, Amit Kucheria
<amitk@...nel.org>, Zhang Rui <rui.zhang@...el.com>, Christophe JAILLET
<christophe.jaillet@...adoo.fr>, niks@...nel.org
Subject: Re: [PATCH v9 6/9] PCI/bwctrl: Re-add BW notification portdrv as
PCIe BW controller
On Fri, 2024-12-06 at 20:31 +0100, Niklas Schnelle wrote:
> On Fri, 2024-12-06 at 19:12 +0100, Niklas Schnelle wrote:
> > On Fri, 2024-10-18 at 17:47 +0300, Ilpo Järvinen wrote:
> > > This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove
> > > bandwidth notification"). An upcoming commit extends this driver
> > > building PCIe bandwidth controller on top of it.
> > >
> > > The PCIe bandwidth notification were first added in the commit
> > > e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> > > notification") but later had to be removed. The significant changes
> > > compared with the old bandwidth notification driver include:
> > >
> ---8<---
> > > ---
> >
> > Hi Ilpo,
> >
> > I bisected a v6.13-rc1 boot hang on my personal workstation to this
> > patch. Sadly I don't have much details like a panic or so because the
> > boot hangs before any kernel messages, or at least they're not visible
> > long enough to see. I haven't yet looked into the code as I wanted to
> > raise awareness first. Since the commit doesn't revert cleanly on
> > v6.13-rc1 I also haven't tried that yet.
> >
> > Here are some details on my system:
> > - AMD Ryzen 9 3900X
> > - ASRock X570 Creator Motherboard
> > - Radeon RX 5600 XT
> > - Intel JHL7540 Thunderbolt 3 USB Controller (only USB 2 plugged)
> > - Intel 82599 10 Gigabit NIC with SR-IOV enabled with 2 VFs
> > - Intel n I211 Gigabit NIC
> > - Intel Wi-Fi 6 AX200
> > - Aquantia AQtion AQC107 NIC
> >
> > If you have patches or things to try just ask.
> >
> > Thanks,
> > Niklas
> >
>
> Ok I can now at least confirm that bluntly disabling the new bwctrl
> driver with the below diff on top of v6.13-rc1 circumvents the boot
> hang I'm seeing. So it's definitely this.
>
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 5e10306b6308..6fa54480444a 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -828,7 +828,7 @@ static void __init pcie_init_services(void)
> pcie_aer_init();
> pcie_pme_init();
> pcie_dpc_init();
> - pcie_bwctrl_init();
> + /* pcie_bwctrl_init(); */
> pcie_hp_init();
> }
>
Also here is the full lspci -vvv output running the above on v6.13-rc1:
https://paste.js.org/9UwQIMp7eSgp
Also note that I have CONFIG_PCIE_THERMAL unset so it's also not the
cooling device portion that's causing the issue. Next I guess I should
narrow it down to the specific port where enabling the bandwidth
monitoring is causing trouble, not yet sure how best to do this with
this many devices.
Thanks,
Niklas
Powered by blists - more mailing lists