lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061207121135.GA15529@elte.hu>
Date:	Thu, 7 Dec 2006 13:11:35 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Len Brown <lenb@...nel.org>
Cc:	linux-kernel@...r.kernel.org, Andrew Morton <akpm@...l.org>,
	ak@...e.de, Linus Torvalds <torvalds@...l.org>
Subject: [patch] x86_64: do not enable the NMI watchdog by default


* Len Brown <len.brown@...el.com> wrote:

> Personally I have never been a big fan of having the NMI watchdog 
> running by default on all systems -- but Andi insists that it helps 
> him debug failures, so tick it does...

enabling it by default was IMO a really bad decision and it needs to be 
undone via the patch attached further below.

If Andi wants to debug stuff via the NMI wachdog, he should use the 
nmi_watchdog=2 boot option: next to the tty=ttyS0 serial console 
options, initcall_debug, apic=debug, earlyprintk and myriads of other 
kernel-hackers-only boot options that we all use to 'help debug 
failures' ...

also, lock debugging facilities catch lockup possibilities (and actual 
lockups) alot more efficiently, i cannot remember the last time the NMI 
watchdog caught anything on my boxes without some other facility not 
triggering first. (and i have it enabled everywhere)

I have run a quick analysis over all locking related crashes i triggered 
on one particular testbox over the past 2 weeks. Out of 26 separate lock 
related incidents spread out equally over that timeframe (out of 497 
bootups on this box), this was the distribution of debugging facilities 
that caught the bugs:

      1  BUG: spinlock lockup on
      1  [ INFO: inconsistent lock state ]
      2  BUG: scheduling while atomic
      2  [ BUG: bad unlock balance detected! ]
      2  BUG: sleeping function called from invalid context
      6  BUG: scheduling with irqs disabled
      5  [ INFO: hard-safe -> hard-unsafe lock order detected ]
      7  BUG: using smp_processor_id() in preemptible [] code

8 were caught by lockdep, 8 by atomicity checks in the scheduler, 7 by 
DEBUG_PREEMPT and 1 by DEBUG_SPINLOCK.

Note: zero were caught by the NMI watchdog, and i run the NMI watchdog 
enabled by default on all architectures, and i have serial logging of 
everything.

but even for the typical distro kernel and for the typical user, the NMI 
watchdog is normally useless, because NMI lockups rarely make it into 
the syslog and X just locks up without showing anything on the screen.

	Ingo

----------------------->
Subject: [patch] x86_64: do not enable the NMI watchdog by default
From: Ingo Molnar <mingo@...e.hu>

do not enable the NMI watchdog by default. Now that we have lockdep i 
cannot remember the last time it caught a real bug, but the NMI watchdog 
can /cause/ problems. Furthermore, to the typical user, an NMI watchdog 
assert results in a total lockup anyway (if under X). In that sense, all 
that the NMI watchdog does is that it makes the system /less/ stable and 
/less/ debuggable.

people can still enable it either after bootup via:

   echo 1 > /proc/sys/kernel/nmi

or via the nmi_watchdog=1 or nmi_watchdog=2 boot options.

build and boot tested on an Athlon64 box.

Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
 arch/x86_64/kernel/apic.c    |    1 -
 arch/x86_64/kernel/io_apic.c |    5 +----
 arch/x86_64/kernel/nmi.c     |    2 +-
 arch/x86_64/kernel/smpboot.c |    1 -
 include/asm-x86_64/nmi.h     |    1 -
 5 files changed, 2 insertions(+), 8 deletions(-)

Index: linux/arch/x86_64/kernel/apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/apic.c
+++ linux/arch/x86_64/kernel/apic.c
@@ -443,7 +443,6 @@ void __cpuinit setup_local_APIC (void)
 			oldvalue, value);
 	}
 
-	nmi_watchdog_default();
 	setup_apic_nmi_watchdog(NULL);
 	apic_pm_activate();
 }
Index: linux/arch/x86_64/kernel/io_apic.c
===================================================================
--- linux.orig/arch/x86_64/kernel/io_apic.c
+++ linux/arch/x86_64/kernel/io_apic.c
@@ -1604,7 +1604,6 @@ static inline void check_timer(void)
 		 */
 		unmask_IO_APIC_irq(0);
 		if (!no_timer_check && timer_irq_works()) {
-			nmi_watchdog_default();
 			if (nmi_watchdog == NMI_IO_APIC) {
 				disable_8259A_irq(0);
 				setup_nmi();
@@ -1630,10 +1629,8 @@ static inline void check_timer(void)
 		setup_ExtINT_IRQ0_pin(apic2, pin2, vector);
 		if (timer_irq_works()) {
 			apic_printk(APIC_VERBOSE," works.\n");
-			nmi_watchdog_default();
-			if (nmi_watchdog == NMI_IO_APIC) {
+			if (nmi_watchdog == NMI_IO_APIC)
 				setup_nmi();
-			}
 			return;
 		}
 		/*
Index: linux/arch/x86_64/kernel/nmi.c
===================================================================
--- linux.orig/arch/x86_64/kernel/nmi.c
+++ linux/arch/x86_64/kernel/nmi.c
@@ -181,7 +181,7 @@ static __cpuinit inline int nmi_known_cp
 }
 
 /* Run after command line and cpu_init init, but before all other checks */
-void nmi_watchdog_default(void)
+static inline void nmi_watchdog_default(void)
 {
 	if (nmi_watchdog != NMI_DEFAULT)
 		return;
Index: linux/arch/x86_64/kernel/smpboot.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smpboot.c
+++ linux/arch/x86_64/kernel/smpboot.c
@@ -866,7 +866,6 @@ static int __init smp_sanity_check(unsig
  */
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
-	nmi_watchdog_default();
 	current_cpu_data = boot_cpu_data;
 	current_thread_info()->cpu = 0;  /* needed? */
 	set_cpu_sibling_map(0);
Index: linux/include/asm-x86_64/nmi.h
===================================================================
--- linux.orig/include/asm-x86_64/nmi.h
+++ linux/include/asm-x86_64/nmi.h
@@ -59,7 +59,6 @@ extern void disable_timer_nmi_watchdog(v
 extern void enable_timer_nmi_watchdog(void);
 extern int nmi_watchdog_tick (struct pt_regs * regs, unsigned reason);
 
-extern void nmi_watchdog_default(void);
 extern int setup_nmi_watchdog(char *);
 
 extern atomic_t nmi_active;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ