lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Jun 2014 13:22:06 -0300
From:	Kleber Sacilotto de Souza <klebers@...ux.vnet.ibm.com>
To:	linux-kernel@...r.kernel.org
Cc:	haver@...ux.vnet.ibm.com, gregkh@...uxfoundation.org,
	Kleber Sacilotto de Souza <klebers@...ux.vnet.ibm.com>
Subject: [PATCH 3/4] GenWQE: Improve hardware error recovery

Currently, in the event of a fatal hardware error, the driver tries a
recovery procedure that calls pci_reset_function() to reset the card.
This is not sufficient in some cases, needing a fundamental reset to
bring the card back.

This patch implements a call to the platform fundamental reset procedure
on the error recovery path if GENWQE_PLATFORM_ERROR_RECOVERY is enabled.
This is implemented by default only on PPC64, since this can cause
problems on other archs, e.g. zSeries, where the platform has its own
recovery procedures, leading to a potencial race conditition. For these
cases, the recovery is kept as it was before.

Signed-off-by: Kleber Sacilotto de Souza <klebers@...ux.vnet.ibm.com>
---
 drivers/misc/genwqe/card_base.c |   45 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/drivers/misc/genwqe/card_base.c b/drivers/misc/genwqe/card_base.c
index 87ebaba..abb7961 100644
--- a/drivers/misc/genwqe/card_base.c
+++ b/drivers/misc/genwqe/card_base.c
@@ -797,6 +797,41 @@ static int genwqe_pci_fundamental_reset(struct pci_dev *pci_dev)
 	return rc;
 }
 
+
+static int genwqe_platform_recovery(struct genwqe_dev *cd)
+{
+	struct pci_dev *pci_dev = cd->pci_dev;
+	int rc;
+
+	dev_info(&pci_dev->dev,
+		 "[%s] resetting card for error recovery\n", __func__);
+
+	/* Clear out error injection flags */
+	cd->err_inject &= ~(GENWQE_INJECT_HARDWARE_FAILURE |
+			    GENWQE_INJECT_GFIR_FATAL |
+			    GENWQE_INJECT_GFIR_INFO);
+
+	genwqe_stop(cd);
+
+	/* Try recoverying the card with fundamental reset */
+	rc = genwqe_pci_fundamental_reset(pci_dev);
+	if (!rc) {
+		rc = genwqe_start(cd);
+		if (!rc)
+			dev_info(&pci_dev->dev,
+				 "[%s] card recovered\n", __func__);
+		else
+			dev_err(&pci_dev->dev,
+				"[%s] err: cannot start card services! (err=%d)\n",
+				__func__, rc);
+	} else {
+		dev_err(&pci_dev->dev,
+			"[%s] card reset failed\n", __func__);
+	}
+
+	return rc;
+}
+
 /*
  * genwqe_reload_bistream() - reload card bitstream
  *
@@ -875,6 +910,7 @@ static int genwqe_health_thread(void *data)
 	struct pci_dev *pci_dev = cd->pci_dev;
 	u64 gfir, gfir_masked, slu_unitcfg, app_unitcfg;
 
+ health_thread_begin:
 	while (!kthread_should_stop()) {
 		rc = wait_event_interruptible_timeout(cd->health_waitq,
 			 (genwqe_health_check_cond(cd, &gfir) ||
@@ -960,6 +996,15 @@ static int genwqe_health_thread(void *data)
 		/* We do nothing if the card is going over PCI recovery */
 		if (pci_channel_offline(pci_dev))
 			return -EIO;
+
+		/*
+		 * If it's supported by the platform, we try a fundamental reset
+		 * to recover from a fatal error. Otherwise, we continue to wait
+		 * for an external recovery procedure to take care of it.
+		 */
+		rc = genwqe_platform_recovery(cd);
+		if (!rc)
+			goto health_thread_begin;
 	}
 
 	dev_err(&pci_dev->dev,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists