lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250519102808.4130271-1-raag.jadav@intel.com>
Date: Mon, 19 May 2025 15:58:08 +0530
From: Raag Jadav <raag.jadav@...el.com>
To: rafael@...nel.org,
	mahesh@...ux.ibm.com,
	oohall@...il.com,
	bhelgaas@...gle.com
Cc: linux-pci@...r.kernel.org,
	linux-pm@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	ilpo.jarvinen@...ux.intel.com,
	lukas@...ner.de,
	aravind.iddamsetty@...ux.intel.com,
	superm1@...nel.org,
	benato.denis96@...il.com,
	Raag Jadav <raag.jadav@...el.com>
Subject: [PATCH v4] PCI: Prevent power state transition of erroneous device

If error status is set on an AER capable device, most likely either the
device recovery is in progress or has already failed. Neither of the
cases are well suited for power state transition of the device, since
this can lead to unpredictable consequences like resume failure, or in
worst case the device is lost because of it. Leave the device in its
existing power state to avoid such issues.

Signed-off-by: Raag Jadav <raag.jadav@...el.com>
---

v2: Synchronize AER handling with PCI PM (Rafael)
v3: Move pci_aer_in_progress() to pci_set_low_power_state() (Rafael)
    Elaborate "why" (Bjorn)
v4: Rely on error status instead of device status
    Condense comment (Lukas)

More discussion on [1].
[1] https://lore.kernel.org/all/CAJZ5v0g-aJXfVH+Uc=9eRPuW08t-6PwzdyMXsC6FZRKYJtY03Q@mail.gmail.com/

 drivers/pci/pci.c      |  9 +++++++++
 drivers/pci/pcie/aer.c | 13 +++++++++++++
 include/linux/aer.h    |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4d7c9f64ea24..a20018692933 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -9,6 +9,7 @@
  */
 
 #include <linux/acpi.h>
+#include <linux/aer.h>
 #include <linux/kernel.h>
 #include <linux/delay.h>
 #include <linux/dmi.h>
@@ -1539,6 +1540,14 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool
 	   || (state == PCI_D2 && !dev->d2_support))
 		return -EIO;
 
+	/*
+	 * If error status is set on an AER capable device, it is not well
+	 * suited for power state transition. Leave it in its existing power
+	 * state to avoid issues like unpredictable resume failure.
+	 */
+	if (pci_aer_in_progress(dev))
+		return -EIO;
+
 	pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
 	if (PCI_POSSIBLE_ERROR(pmcsr)) {
 		pci_err(dev, "Unable to change power state from %s to %s, device inaccessible\n",
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index a1cf8c7ef628..617fbac0d38a 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -237,6 +237,19 @@ int pcie_aer_is_native(struct pci_dev *dev)
 }
 EXPORT_SYMBOL_NS_GPL(pcie_aer_is_native, "CXL");
 
+bool pci_aer_in_progress(struct pci_dev *dev)
+{
+	int aer = dev->aer_cap;
+	u32 cor, uncor;
+
+	if (!pcie_aer_is_native(dev))
+		return false;
+
+	pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS, &cor);
+	pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, &uncor);
+	return cor || uncor;
+}
+
 static int pci_enable_pcie_error_reporting(struct pci_dev *dev)
 {
 	int rc;
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 02940be66324..e6a380bb2e68 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -56,12 +56,14 @@ struct aer_capability_regs {
 #if defined(CONFIG_PCIEAER)
 int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
 int pcie_aer_is_native(struct pci_dev *dev);
+bool pci_aer_in_progress(struct pci_dev *dev);
 #else
 static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
 {
 	return -EINVAL;
 }
 static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
+static inline bool pci_aer_in_progress(struct pci_dev *dev) { return false; }
 #endif
 
 void pci_print_aer(struct pci_dev *dev, int aer_severity,
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ