lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20240208132205.4550-1-ilpo.jarvinen@linux.intel.com>
Date: Thu,  8 Feb 2024 15:22:04 +0200
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Bjorn Helgaas <bhelgaas@...gle.com>,
	linux-pci@...r.kernel.org,
	"Maciej W . Rozycki" <macro@...am.me.uk>,
	linux-kernel@...r.kernel.org
Cc: Mika Westerberg <mika.westerberg@...ux.intel.com>,
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Subject: [PATCH v2 1/1] PCI: Fix link activation wait logic

If link retraining fails in pcie_failed_link_retrain() it returns false
but the wrong logic in pcie_wait_for_link_delay() translates this into
success by returning true after a delay.

As a result, pci_bridge_wait_for_secondary_bus() does not print out a
message and return failure but goes into pci_dev_wait() which just
spends >60s waiting for a device that will not come up.

The long resume delay problem has been observed to occur when resuming
devices that got disconnected while suspended:

pcieport 0000:00:07.2: power state changed by ACPI to D3cold
..
thunderbolt 1-701: device disconnected
pcieport 0000:00:07.2: power state changed by ACPI to D0
pcieport 0000:00:07.2: waiting 100 ms for downstream link
pcieport 0000:57:03.0: waiting 100 ms for downstream link, after activation
pcieport 0000:57:03.0: broken device, retraining non-functional downstream link at 2.5GT/s
pcieport 0000:57:03.0: retraining failed
pcieport 0000:57:03.0: broken device, retraining non-functional downstream link at 2.5GT/s
pcieport 0000:57:03.0: retraining failed
pcieport 0000:73:00.0: not ready 1023ms after resume; waiting
pcieport 0000:73:00.0: not ready 2047ms after resume; waiting
pcieport 0000:73:00.0: not ready 4095ms after resume; waiting
pcieport 0000:73:00.0: not ready 8191ms after resume; waiting
pcieport 0000:73:00.0: not ready 16383ms after resume; waiting
pcieport 0000:73:00.0: not ready 32767ms after resume; waiting
pcieport 0000:73:00.0: not ready 65535ms after resume; giving up
pcieport 0000:57:03.0: pciehp: pciehp_check_link_active: lnk_status = 5041
pcieport 0000:73:00.0: Unable to change power state from D3cold to D0, device inaccessible
pcieport 0000:57:03.0: pciehp: Slot(3): Card not present

Fix the logic error by returning false immediately if
pcie_failed_link_retrain() fails.

Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
Link: https://lore.kernel.org/linux-pci/a0b070b7-14ce-7cc5-4e6c-6e15f3fcab75@linux.intel.com/T/#t
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
---

I think this change should be made in the same change as the Target
Speed quirk fix (make it return false when no retraining was
attempted) because otherwise there are additional logic troubles
in the intermediate state.

v2:
- Removed quirks part (still needed but Maciej planned to test and send
  another patch for that)
- Improved commit message

---
 drivers/pci/pci.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d8f11a078924..ca4159472a72 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5068,9 +5068,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
 		msleep(20);
 	rc = pcie_wait_for_link_status(pdev, false, active);
 	if (active) {
-		if (rc)
-			rc = pcie_failed_link_retrain(pdev);
-		if (rc)
+		if (rc < 0 && !pcie_failed_link_retrain(pdev))
 			return false;
 
 		msleep(delay);
-- 
2.39.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ