lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 15 Dec 2009 16:30:26 +0100
From:	Borislav Petkov <borislav.petkov@....com>
To:	Johannes Hirte <johannes.hirte@....tu-ilmenau.de>
CC:	Borislav Petkov <petkovbb@...glemail.com>,
	linux-kernel@...r.kernel.org,
	osrc-patches <osrc-patches@...e.amd.com>
Subject: Re: K8 ECC error with linux-2.6.32

On Tue, Dec 15, 2009 at 08:08:04AM +0100, Johannes Hirte wrote:
>  Northbridge Error, node 0, core: -1
> amd_decode_nb_mce: NBSL: 0x0005001b, NBSL: 0xa4000000
> K8 ECC error.

Yep, this is a benign GART TLB error which is not being reported but
you're using the amd64_edac module and it trips since the error is still
being logged and the module sees it. There are two fixes:

1. If you have a BIOS option with a wording like:

"Gart Table Walk Error MC reporting: Disabled/Enabled."

which should disable it.

2. If no BIOS option, the patch below should fix it. Can you please
test (against v2.6.32).

Thanks.

---
diff --git a/drivers/edac/edac_mce_amd.c b/drivers/edac/edac_mce_amd.c
index 713ed7d..026f0cb 100644
--- a/drivers/edac/edac_mce_amd.c
+++ b/drivers/edac/edac_mce_amd.c
@@ -300,6 +300,12 @@ void amd_decode_nb_mce(int node_id, struct err_regs *regs, int handle_errors)
 	if (!handle_errors)
 		return;
 
+	/*
+	 * GART TLB error reporting is disabled by default. Bail out early.
+	 */
+	if (TLB_ERROR(ec) && !report_gart_errors)
+		return;
+
 	pr_emerg(" Northbridge Error, node %d", node_id);
 
 	/*
@@ -311,10 +317,9 @@ void amd_decode_nb_mce(int node_id, struct err_regs *regs, int handle_errors)
 		if (regs->nbsh & K8_NBSH_ERR_CPU_VAL)
 			pr_cont(", core: %u\n", (u8)(regs->nbsh & 0xf));
 	} else {
-		pr_cont(", core: %d\n", ilog2((regs->nbsh & 0xf)));
+		pr_cont(", core: %d\n", fls((regs->nbsh & 0xf) - 1));
 	}
 
-
 	pr_emerg("%s.\n", EXT_ERR_MSG(xec));
 
 	if (BUS_ERROR(ec) && nb_bus_decoder)
@@ -334,21 +339,6 @@ static void amd_decode_fr_mce(u64 mc5_status)
 static inline void amd_decode_err_code(unsigned int ec)
 {
 	if (TLB_ERROR(ec)) {
-		/*
-		 * GART errors are intended to help graphics driver developers
-		 * to detect bad GART PTEs. It is recommended by AMD to disable
-		 * GART table walk error reporting by default[1] (currently
-		 * being disabled in mce_cpu_quirks()) and according to the
-		 * comment in mce_cpu_quirks(), such GART errors can be
-		 * incorrectly triggered. We may see these errors anyway and
-		 * unless requested by the user, they won't be reported.
-		 *
-		 * [1] section 13.10.1 on BIOS and Kernel Developers Guide for
-		 *     AMD NPT family 0Fh processors
-		 */
-		if (!report_gart_errors)
-			return;
-
 		pr_emerg(" Transaction: %s, Cache Level %s\n",
 			 TT_MSG(ec), LL_MSG(ec));
 	} else if (MEM_ERROR(ec)) {

-- 
Regards/Gruss,
Boris.

Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ