lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 28 Aug 2007 13:13:44 -0700
From:	Daniel Walker <dwalker@...sta.com>
To:	eranian@....hp.com
Cc:	Björn Steinbrink <B.Steinbrink@....de>,
	ak@...e.de, linux-kernel@...r.kernel.org, akpm@...ux-foundation.org
Subject: Re: nmi_watchdog=2 regression in 2.6.21

On Tue, 2007-08-28 at 12:46 -0700, Stephane Eranian wrote:


> I think I found the problem. As I suspected, it seems there is an assymetry
> between the 1st end 2nd counter (just like what they have on P6 core). Yet
> for architectural perfmon v1, this restriction is supposed to be lifted.
> 
> Unfortunately, a quick look at the errata document at
> http://download.intel.com/design/mobile/SPECUPDT/30922212.pdf
> 
> for Core Duo shows bugs A49 as 'nofix':
>   Core Duo processor has a bug which renders the enable bit (22) of
>   PERFEVTSEL1 inoperative. The processor behaves like former P6 cores,
>   the enable bit of PERFEVTSEL0 controls the activation of both counters

Your patch switched the nmi from PERFEVTSEL0 to PERFEVTSEL1  (right?)..
So 0 works, 1 does not , for me anyway ..

> That explains why you get the 'NMI stuck' message when using PERFEVTSEL1.
> I suspect when using PERFEVTSEL0, then NMI watchdog is not stuck. So it
> is possible that in case NMI is stuck the code does not cleanly shutdown the
> NMI interrupt and you get some spurious NMI interrupt later in the boot at
> you are stuck because you are holding a lock. We need to look at the error
> path of check_nmi_watchdog(). I glance through it and could not find the place
> where the APIC vector is cleared.

As far as I can tell the check_nmi_watchdog() doesn't take locks, and it
can't safely share a lock with the NMI .. 

The patch below fixes the hang (not the stuck NMI) .. Not totally sure
why, but the cpus are stuck in a loop waiting for the endflag which
never comes .. This also plays with the nmi hz which might do
something.. /proc/interrupt doesn't show any nmi's either..

Daniel

Signed-off-by: Daniel Walker <dwalker@...sta.com>

Index: linux-2.6.22/arch/i386/kernel/nmi.c
===================================================================
--- linux-2.6.22.orig/arch/i386/kernel/nmi.c	2007-08-15 00:51:12.000000000 +0000
+++ linux-2.6.22/arch/i386/kernel/nmi.c	2007-08-28 20:15:04.000000000 +0000
@@ -82,7 +82,7 @@ static __init void nmi_cpu_busy(void *da
 static int __init check_nmi_watchdog(void)
 {
 	unsigned int *prev_nmi_count;
-	int cpu;
+	int cpu, ret = 0;
 
 	if ((nmi_watchdog == NMI_NONE) || (nmi_watchdog == NMI_DEFAULT))
 		return 0;
@@ -125,18 +125,18 @@ static int __init check_nmi_watchdog(voi
 	if (!atomic_read(&nmi_active)) {
 		kfree(prev_nmi_count);
 		atomic_set(&nmi_active, -1);
-		return -1;
-	}
+		printk("nmi malfunctioning.\n");
+		ret = -1;
+	} else 
+		printk("OK.\n");
 	endflag = 1;
-	printk("OK.\n");
-
 	/* now that we know it works we can reduce NMI frequency to
 	   something more reasonable; makes a difference in some configs */
 	if (nmi_watchdog == NMI_LOCAL_APIC)
 		nmi_hz = lapic_adjust_nmi_hz(1);
 
 	kfree(prev_nmi_count);
-	return 0;
+	return ret;
 }
 /* This needs to happen later in boot so counters are working */
 late_initcall(check_nmi_watchdog);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ