lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 1 Dec 2009 16:16:39 +0100
From:	Borislav Petkov <borislav.petkov@....com>
To:	Randy Dunlap <randy.dunlap@...cle.com>
CC:	Borislav Petkov <petkovbb@...glemail.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Doug Thompson <dougthompson@...ssion.com>
Subject: Re: 2.6.32-rc8: amd64_edac slub error

> EDAC DEBUG: in drivers/edac/amd64_edac.c, line at 2367:     DRAM MEM-CTL PCI Bus ID:	0000:00:18.2
> EDAC DEBUG: in drivers/edac/amd64_edac.c, line at 2369:     Misc device PCI Bus ID:	0000:00:18.3
> calling  alsa_pcm_init+0x0/0x71 [snd_pcm] @ 1402
> initcall alsa_pcm_init+0x0/0x71 [snd_pcm] returned 0 after 17 usecs
> EDAC amd64: ECC is enabled by BIOS.
> get_cpus_on_this_dct_cpumask: nid: 0, cpu: 0
> get_cpus_on_this_dct_cpumask: nid: 0, cpu: 2
> amd64_nb_mce_bank_enabled_on_node: weight: 2
> EDAC DEBUG: in drivers/edac/amd64_edac.c, line at 2776: core: 0, MCG_CTL: 0x1f, NB MSR is enabled
> EDAC DEBUG: in drivers/edac/amd64_edac.c, line at 2776: core: 2, MCG_CTL: 0x0, NB MSR is disabled
> =============================================================================
> BUG kmalloc-16: Redzone overwritten
> -----------------------------------------------------------------------------

Hmm, I think I know what happens. This machine has non-contigious
core enumeration on a node (e.g. 0,2 on node 0 instead of 0,1) but
rdmsr_on_cpus assumes the former. Therefore we write outside of the
allocated msrs struct and thus the redzone overwrite. Here's a simple
fix that should take care of it. Please apply on top of the debugging
patch and catch the output again so that we could verify it.

I'll fix this properly when I get back and then maybe even backport it
depending on the intrusiveness of the changes.

Thanks.

---
 drivers/edac/amd64_edac.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 139bc14..c013261 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2750,7 +2750,8 @@ static bool amd64_nb_mce_bank_enabled_on_node(int nid)
 {
 	cpumask_t mask;
 	struct msr *msrs;
-	int cpu, nbe, idx = 0;
+	int cpu, nbe, i, idx = 0;
+	int first_cpu, last_cpu = 0;
 	bool ret = false;
 
 	cpumask_clear(&mask);
@@ -2759,7 +2760,17 @@ static bool amd64_nb_mce_bank_enabled_on_node(int nid)
 
 	pr_err("%s: weight: %d\n", __func__, cpumask_weight(&mask));
 
-	msrs = kzalloc(sizeof(struct msr) * cpumask_weight(&mask), GFP_KERNEL);
+	/*
+	 * calc. cores interval when non-contigious core enumeration
+	 */
+	first_cpu = cpumask_first(&mask);
+
+	for (i = first_cpu; i < nr_cpu_ids; i++)
+		if (cpumask_test_cpu(i, &mask))
+			last_cpu = i;
+
+	msrs = kzalloc(sizeof(struct msr) * (last_cpu - first_cpu + 1),
+		       GFP_KERNEL);
 	if (!msrs) {
 		amd64_printk(KERN_WARNING, "%s: error allocating msrs\n",
 			      __func__);
-- 
1.6.4.3

-- 
Regards/Gruss,
Boris.

Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ