[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z4eLh24IkDrAm6cm@wunner.de>
Date: Wed, 15 Jan 2025 11:18:47 +0100
From: Lukas Wunner <lukas@...ner.de>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc: Jiwei <jiwei.sun.bj@...com>, macro@...am.me.uk, bhelgaas@...gle.com,
linux-pci@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
guojinhui.liam@...edance.com, helgaas@...nel.org,
ahuang12@...ovo.com, sunjw10@...ovo.com
Subject: Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing to Gen 1 during
hotplug testing
On Tue, Jan 14, 2025 at 08:25:04PM +0200, Ilpo Järvinen wrote:
> On Tue, 14 Jan 2025, Jiwei wrote:
> > [ 539.362400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841
> > [ 539.395720] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041
>
> DLLLA=0
>
> But LBMS did not get reset.
>
> So is this perhaps because hotplug cannot keep up with the rapid
> remove/add going on, and thus will not always call the remove_board()
> even if the device went away?
>
> Lukas, do you know if there's a good way to resolve this within hotplug
> side?
I believe the pciehp code is fine and suspect this is an issue
in the quirk. We've been dealing with rapid add/remove in pciehp
for years without issues.
I don't understand the quirk sufficiently to make a guess
what's going wrong, but I'm wondering if there could be
a race accessing the lbms_count?
Maybe if lbms_count is replaced by a flag in pci_dev->priv_flags
as we've discussed, with proper memory barriers where necessary,
this problem will solve itself?
Thanks,
Lukas
Powered by blists - more mailing lists