netdev - Re: PCI: Work around PCIe link training failures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.2407222117300.51207@angie.orcam.me.uk>
Date: Mon, 22 Jul 2024 21:40:29 +0100 (BST)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: Matthew W Carlis <mattc@...estorage.com>
cc: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>, 
    alex.williamson@...hat.com, Bjorn Helgaas <bhelgaas@...gle.com>, 
    christophe.leroy@...roup.eu, "David S. Miller" <davem@...emloft.net>, 
    david.abdurachmanov@...il.com, edumazet@...gle.com, kuba@...nel.org, 
    leon@...nel.org, linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org, 
    linux-rdma@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org, lukas@...ner.de, 
    mahesh@...ux.ibm.com, mika.westerberg@...ux.intel.com, mpe@...erman.id.au, 
    netdev@...r.kernel.org, npiggin@...il.com, oohall@...il.com, 
    pabeni@...hat.com, pali@...nel.org, saeedm@...dia.com, sr@...x.de, 
    Jim Wilson <wilson@...iptree.org>
Subject: Re: PCI: Work around PCIe link training failures

[+cc Ilpo for his previous involvement here]

On Mon, 22 Jul 2024, Matthew W Carlis wrote:

> Sorry to resurrect this one, but I was wondering why the
> PCI device ID in drivers/pci/quirks.c for the ASMedia ASM2824
> isn't checked before forcing the link down to Gen1... We have
> had to revert this patch during our kernel migration due to it
> interacting poorly with at least one older Gen3 PLX PCIe switch
> vendor/generation while using DPC. In another context we have
> found similar issues during system bringup without DPC while
> using a more legacy hot-plug model (BIOS defaults for us..).
> In both contexts our devices are stuck at Gen1 after physical
> hot-plug/insert, power-cycle.

 Sorry to hear about your problems.  However the workaround is supposed to 
only trigger if the link has already failed negotiation.  Could you please 
be more specific as to the actual scenario where it triggers?

 A scenario was mentioned earlier on, where a downstream device has been 
removed from a slot and left behind the LBMS bit set in the corresponding 
downstream port of the upstream device.  It then triggered the workaround 
where the port was rescanned with the slot still empty, which then left 
the link capped at 2.5GT/s for a device subsequently inserted.  Is it what 
happens for you?

 As I recall Ilpo has been working on changes that among others should 
make sure no stale LBMS bit has been left set, but I'm not sure what the 
state of affairs has been here.  Myself I've been too swamped in the 
recent months and consequently didn't look into any improvements in this 
area (and unrelated issues involving the system in question in my remote 
lab have further impeded me).

> Tried reading through the patch history/review but it was still
> a little bit unclear to me. Can we add the device ID check as a
> precondition to forcing link to Gen1?

 The main reason is it is believed that it is the downstream device 
causing the issue, and obviously you can't fetch its ID if you can't 
negotiate link so as to talk to it in the first place.

  Maciej