lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2408251408160.30766@angie.orcam.me.uk>
Date: Sun, 25 Aug 2024 14:47:29 +0100 (BST)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>, 
    Matthew W Carlis <mattc@...estorage.com>, 
    Bjorn Helgaas <bhelgaas@...gle.com>
cc: Mika Westerberg <mika.westerberg@...ux.intel.com>, 
    Oliver O'Halloran <oohall@...il.com>, linux-pci@...r.kernel.org, 
    linux-kernel@...r.kernel.org
Subject: [PATCH v3 1/4] PCI: Clear the LBMS bit after a link retrain

The LBMS bit, where implemented, is set by hardware either in response 
to the completion of retraining caused by writing 1 to the Retrain Link 
bit or whenever hardware has changed the link speed or width in attempt 
to correct unreliable link operation.  It is never cleared by hardware 
other than by software writing 1 to the bit position in the Link Status 
register and we never do such a write.

We currently have two places, namely `apply_bad_link_workaround' and 
`pcie_failed_link_retrain' in drivers/pci/controller/dwc/pcie-tegra194.c 
and drivers/pci/quirks.c respectively where we check the state of the 
LBMS bit and neither is interested in the state of the bit resulting 
from the completion of retraining, both check for a link fault.

And in particular `pcie_failed_link_retrain' causes issues consequently, 
by trying to retrain a link where there's no downstream device anymore 
and the state of 1 in the LBMS bit has been retained from when there was 
a device downstream that has since been removed.

Clear the LBMS bit then at the conclusion of `pcie_retrain_link', so 
that we have a single place that controls it and that our code can track 
link speed or width changes resulting from unreliable link operation.

Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Reported-by: Matthew W Carlis <mattc@...estorage.com>
Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
Signed-off-by: Maciej W. Rozycki <macro@...am.me.uk>
Cc: stable@...r.kernel.org # v6.5+
---
No change from v2.

New change in v2.
---
 drivers/pci/pci.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

linux-pcie-retrain-link-lbms-clear.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -4718,7 +4718,15 @@ int pcie_retrain_link(struct pci_dev *pd
 		pcie_capability_clear_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
 	}
 
-	return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+	rc = pcie_wait_for_link_status(pdev, use_lt, !use_lt);
+
+	/*
+	 * Clear LBMS after a manual retrain so that the bit can be used
+	 * to track link speed or width changes made by hardware itself
+	 * in attempt to correct unreliable link operation.
+	 */
+	pcie_capability_write_word(pdev, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
+	return rc;
 }
 
 /**

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ