lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed,  4 Dec 2013 17:55:56 -0800
From:	Ben Zhang <benzh@...omium.org>
To:	linux-kernel@...r.kernel.org
Cc:	Don Zickus <dzickus@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Ben Zhang <benzh@...omium.org>
Subject: [PATCH v2] watchdog: Add a sysctl to disable soft lockup detector

Currently, the soft lockup detector and hard lockup detector
can be enabled or disabled together via the flag variable
watchdog_user_enabled. There isn't a way to disable only the
soft lockup detector while keeping the hard lockup detector
running.

The hard lockup detector sometimes does not work on a x86
machine with multiple cpus when softlockup_panic is set to 0.
For example:
1. Hard lockup occurs on cpu0 ("cli" followed by a infinite loop).
2. Soft lockup occurs on cpu1 shortly after because cpu1 tries to
send a function to cpu0 via smp_call_function_single().
3. watchdog_timer_fn() detects the soft lockup on cpu1 and
dumps the stack. dump_stack() eventually calls touch_nmi_watchdog()
which sets watchdog_nmi_touch=true for all cpus and sets
watchdog_touch_ts=0 for cpu1.
4. NMI fires on cpu0. watchdog_overflow_callback() sees
watchdog_nmi_touch=true, so it does not do anything except setting
watchdog_nmi_touch=false.
5. watchdog_timer_fn() is called again on cpu1, it sees
watchdog_touch_ts=0, so reloads it with the current tick. Thus,
is_softlockup() returns false, and soft_watchdog_warn is set to false.
6. Before NMI can fire on cpu0 again with watchdog_nmi_touch=false,
watchdog_timer_fn() reports the soft lockup on cpu1 again
and we go back to #3.

The machine stays locked up and the log shows repeated reports of
soft lockup on cpu1. Therefore, we need a way to disable the soft
lockup check so that the hard lockup detector can reboot the machine.


* Existing boot options for the watchdog:
nmi_watchdog=panic/nopanic/0
softlockup_panic=0/1
nowatchdog
nosoftlockup

* Variables modified by the boot options:
int watchdog_user_enabled;
unsigned int softlockup_panic;
unsigned int hardlockup_panic;

* Existing sysctls at /proc/sys/kernel/... for the watchdog:
nmi_watchdog=0/1
watchdog=0/1
softlockup_panic=0/1
watchdog_thresh=0~60

* Variables modified by the sysctls:
int watchdog_user_enabled;
unsigned int softlockup_panic;
int watchdog_thresh;


This patch adds a new boot option softlockup_detector_enable
and a sysctl at /proc/sys/kernel/softlockup_detector_enable to
allow disabling only the soft lockup detector.

softlockup_detector_enable=1:
This is the default. The soft lockup detector is enabled.
When a soft lockup is detected, a warning message with
debug info is printed. The kernel may be configured to
panics in this case via the sysctl kernel.softlockup_panic.

softlockup_detector_enable=0:
The soft lockup detector is disabled. Warning message is
not printed on soft lockup. The kernel does not panic on
soft lockup regardless of the value of kernel.softlockup_panic.
Note kernel.softlockup_detector_enable does not affect
the hard lockup detector.

Signed-off-by: Ben Zhang <benzh@...omium.org>
---
 Documentation/kernel-parameters.txt | 11 +++++++++++
 Documentation/sysctl/kernel.txt     | 20 ++++++++++++++++++++
 include/linux/sched.h               |  3 ++-
 kernel/sysctl.c                     |  9 +++++++++
 kernel/watchdog.c                   | 15 +++++++++++++++
 5 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 50680a5..5678ac3 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2980,6 +2980,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 				1: Fast pin select (default)
 				2: ATC IRMode
 
+	softlockup_detector_enable=
+			[KNL] Should the soft-lockup detector be enabled. If
+			the soft-lockup detector is disabled, no warning
+			message is printed on soft lockup, and the kernel does
+			not panic on soft lockup regardless of the value of
+			softlockup_panic. softlockup_detector_enable does not
+			affect the hard lockup detector.
+			If this parameter is not present, the soft-lockup
+			detector is enabled by default.
+			Format: <integer>
+
 	softlockup_panic=
 			[KNL] Should the soft-lockup detector generate panics.
 			Format: <integer>
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 26b7ee4..209212e 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -70,6 +70,7 @@ show up in /proc/sys/kernel:
 - shmall
 - shmmax                      [ sysv ipc ]
 - shmmni
+- softlockup_detector_enable
 - stop-a                      [ SPARC only ]
 - sysrq                       ==> Documentation/sysrq.txt
 - tainted
@@ -718,6 +719,25 @@ without users and with a dead originative process will be destroyed.
 
 ==============================================================
 
+softlockup_detector_enable:
+
+Should the soft-lockup detector be enabled.
+
+softlockup_detector_enable=1:
+This is the default. The soft lockup detector is enabled.
+When a soft lockup is detected, a warning message with
+debug info is printed. The kernel may be configured to
+panics in this case via the sysctl kernel.softlockup_panic.
+
+softlockup_detector_enable=0:
+The soft lockup detector is disabled. Warning message is
+not printed on soft lockup. The kernel does not panic on
+soft lockup regardless of the value of kernel.softlockup_panic.
+Note kernel.softlockup_detector_enable does not affect
+the hard lockup detector.
+
+==============================================================
+
 tainted:
 
 Non-zero if the kernel has been tainted.  Numeric values, which
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 768b037..6d3749d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -269,7 +269,8 @@ extern void touch_all_softlockup_watchdogs(void);
 extern int proc_dowatchdog_thresh(struct ctl_table *table, int write,
 				  void __user *buffer,
 				  size_t *lenp, loff_t *ppos);
-extern unsigned int  softlockup_panic;
+extern unsigned int softlockup_panic;
+extern unsigned int softlockup_detector_enable;
 void lockup_detector_init(void);
 #else
 static inline void touch_softlockup_watchdog(void)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 34a6047..8ae1f36 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -840,6 +840,15 @@ static struct ctl_table kern_table[] = {
 		.extra2		= &one,
 	},
 	{
+		.procname	= "softlockup_detector_enable",
+		.data		= &softlockup_detector_enable,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{
 		.procname       = "nmi_watchdog",
 		.data           = &watchdog_user_enabled,
 		.maxlen         = sizeof (int),
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4431610..b9594e6 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -80,6 +80,18 @@ static int __init softlockup_panic_setup(char *str)
 }
 __setup("softlockup_panic=", softlockup_panic_setup);
 
+unsigned int __read_mostly softlockup_detector_enable = 1;
+
+static int __init softlockup_detector_enable_setup(char *str)
+{
+	unsigned long res;
+	if (kstrtoul(str, 0, &res))
+		res = 1;
+	softlockup_detector_enable = res;
+	return 1;
+}
+__setup("softlockup_detector_enable=", softlockup_detector_enable_setup);
+
 static int __init nowatchdog_setup(char *str)
 {
 	watchdog_user_enabled = 0;
@@ -293,6 +305,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 		return HRTIMER_RESTART;
 	}
 
+	if (!softlockup_detector_enable)
+		return HRTIMER_RESTART;
+
 	/* check for a softlockup
 	 * This is done by making sure a high priority task is
 	 * being scheduled.  The task touches the watchdog to
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ