lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 28 Sep 2017 15:33:05 +0100
From:   Gabriele Paoloni <gabriele.paoloni@...wei.com>
To:     <bhelgaas@...gle.com>, <helgaas@...nel.org>
CC:     <gabriele.paoloni@...wei.com>, <linuxarm@...wei.com>,
        <linux-pci@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <liudongdong3@...wei.com>
Subject: [PATCH v3] PCIe AER: report uncorrectable errors only to the functions that logged the errors

Currently if an uncorrectable error is reported by an EP the AER
driver walks over all the devices connected to the upstream port
bus and in turns call the report_error_detected() callback.
If any of the devices connected to the bus does not implement
dev->driver->err_handler->error_detected() do_recovery() will fail
leaving all the bus hierarchy devices unrecovered.

According to section "6.2.2.2.2. Non-Fatal Errors" of the PCIe specs
<< Non-fatal errors are uncorrectable errors which cause a particular
transaction to be unreliable but the Link is otherwise fully functional.
Isolating Non-fatal from Fatal errors provides Requester/Receiver logic
in a device or system management software the opportunity to recover
from the error without resetting the components on the Link and
disturbing other transactions in progress. Devices not associated with
the transaction in error are not impacted by the error.>>
therefore for non fatal errors the PCIe link should not be considered
compromised and it makes sense to report the error only to all the
functions that logged an error.

This patch implements this new behaviour for non fatal errors.
Also this patch fixes a bug (filed as in the link below)

Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055
Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@...wei.com>
Signed-off-by: Dongdong Liu <liudongdong3@...wei.com>
---
Changes from v2:
   - no functional changes
   - Added reference in the commit log to the bugzilla ticket
   - Added reference in the commit log the commit that this patch fixes
   - Added reference in the commit log to the PCIe specs for Non-fatal
     error handling rules
 
Changes from v1:
   - now errors are reported only to the fucntions that logged the error
     instead of all the functions in the same device.
   - the patch subject has changed to match the new implementation
---
 drivers/pci/pcie/aer/aerdrv_core.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 890efcc..7448052 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -390,7 +390,14 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
 		 * If the error is reported by an end point, we think this
 		 * error is related to the upstream link of the end point.
 		 */
-		pci_walk_bus(dev->bus, cb, &result_data);
+		if (state == pci_channel_io_normal)
+			/*
+			 * the error is non fatal so the bus is ok, just invoke
+			 * the callback for the function that logged the error.
+			 */
+			cb(dev, &result_data);
+		else
+			pci_walk_bus(dev->bus, cb, &result_data);
 	}
 
 	return result_data.result;
-- 
2.7.4


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ