[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2407222117300.51207@angie.orcam.me.uk>
Date: Mon, 22 Jul 2024 21:40:29 +0100 (BST)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: Matthew W Carlis <mattc@...estorage.com>
cc: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
alex.williamson@...hat.com, Bjorn Helgaas <bhelgaas@...gle.com>,
christophe.leroy@...roup.eu, "David S. Miller" <davem@...emloft.net>,
david.abdurachmanov@...il.com, edumazet@...gle.com, kuba@...nel.org,
leon@...nel.org, linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
linux-rdma@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org, lukas@...ner.de,
mahesh@...ux.ibm.com, mika.westerberg@...ux.intel.com, mpe@...erman.id.au,
netdev@...r.kernel.org, npiggin@...il.com, oohall@...il.com,
pabeni@...hat.com, pali@...nel.org, saeedm@...dia.com, sr@...x.de,
Jim Wilson <wilson@...iptree.org>
Subject: Re: PCI: Work around PCIe link training failures
[+cc Ilpo for his previous involvement here]
On Mon, 22 Jul 2024, Matthew W Carlis wrote:
> Sorry to resurrect this one, but I was wondering why the
> PCI device ID in drivers/pci/quirks.c for the ASMedia ASM2824
> isn't checked before forcing the link down to Gen1... We have
> had to revert this patch during our kernel migration due to it
> interacting poorly with at least one older Gen3 PLX PCIe switch
> vendor/generation while using DPC. In another context we have
> found similar issues during system bringup without DPC while
> using a more legacy hot-plug model (BIOS defaults for us..).
> In both contexts our devices are stuck at Gen1 after physical
> hot-plug/insert, power-cycle.
Sorry to hear about your problems. However the workaround is supposed to
only trigger if the link has already failed negotiation. Could you please
be more specific as to the actual scenario where it triggers?
A scenario was mentioned earlier on, where a downstream device has been
removed from a slot and left behind the LBMS bit set in the corresponding
downstream port of the upstream device. It then triggered the workaround
where the port was rescanned with the slot still empty, which then left
the link capped at 2.5GT/s for a device subsequently inserted. Is it what
happens for you?
As I recall Ilpo has been working on changes that among others should
make sure no stale LBMS bit has been left set, but I'm not sure what the
state of affairs has been here. Myself I've been too swamped in the
recent months and consequently didn't look into any improvements in this
area (and unrelated issues involving the system in question in my remote
lab have further impeded me).
> Tried reading through the patch history/review but it was still
> a little bit unclear to me. Can we add the device ID check as a
> precondition to forcing link to Gen1?
The main reason is it is believed that it is the downstream device
causing the issue, and obviously you can't fetch its ID if you can't
negotiate link so as to talk to it in the first place.
Maciej
Powered by blists - more mailing lists