[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2511290112150.36486@angie.orcam.me.uk>
Date: Mon, 1 Dec 2025 03:54:57 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: ALOK TIWARI <alok.a.tiwari@...cle.com>
cc: Lukas Wunner <lukas@...ner.de>,
Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, Jiwei <jiwei.sun.bj@...com>,
linux-pci@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
guojinhui.liam@...edance.com, Bjorn Helgaas <helgaas@...nel.org>,
ahuang12@...ovo.com, sunjw10@...ovo.com
Subject: Re: [External] : Re: [PATCH 2/2] PCI: Fix the PCIe bridge decreasing
to Gen 1 during hotplug testing
On Wed, 26 Nov 2025, ALOK TIWARI wrote:
> We are testing hot-add/hot-remove behavior and observed the same issue as,
> mentioned where the PCIe bridge link speed drops from 32 GT/s to 2.5 GT/s.
>
> My understanding is that pcie_failed_link_retrain should only apply to devices
> matched by PCI_VDEVICE(ASMEDIA, 0x2824),
> but the current implementation appears to affect all devices that take longer
> to establish a link.
Thank you for your report.
No, there seems nothing wrong with said device by itself and the problem
is either with the downstream device (which obviously cannot be discovered
until a link has been actually established), or the particular device pair
or setup. I've originally implemented matching for this particular device
out of the abundance of caution, in case the removal of speed restriction
for other upstream devices (in case the quirk triggered there) would cause
the link to go back into the infinite retraining loop.
> We are unsure if this is intentional, but it effectively allows such
> devices to continue operating at a reduced speed.
It was intentional, but didn't take into account noisy hot-plug scenarios
which are not a part of my lab setup.
> If we extend PCIE_LINK_RETRAIN_TIMEOUT_MS to 3000 ms, these slower devices are
> able to complete link training,
> and the problem is no longer observed in our testing. Therefore, increasing
> PCIE_LINK_RETRAIN_TIMEOUT_MS to 3000 ms seems to resolve the issue for us.
>
> Would it be acceptable to increase PCIE_LINK_RETRAIN_TIMEOUT_MS, from 1000 to
> 3000 ms in this case?
FWIW my understanding is this goes beyond the spec actually.
However given other reports I've given more thought to my idea previously
shared, which has sadly received no feedback to motivate me further, and
implemented yet more simplified an approach, where the 2.5GT/s speed clamp
is always removed regardless of the link state and if that fails, then any
original clamp as at the entry to the quirk is restored. This I hope will
prove robust enough not to cause further issues with hot-plug scenarios.
Please give it a try and let me know if it's fixed your issue:
<https://lore.kernel.org/r/alpine.DEB.2.21.2511290245460.36486@angie.orcam.me.uk/>
Maciej
Powered by blists - more mailing lists