lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110801192407.GE2581@redhat.com>
Date:	Mon, 1 Aug 2011 15:24:07 -0400
From:	Don Zickus <dzickus@...hat.com>
To:	ZAK Magnus <zakmagnus@...gle.com>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Mandeep Singh Baines <msb@...omium.org>
Subject: Re: [PATCH v3 2/2] Make hard lockup detection use timestamps

On Mon, Aug 01, 2011 at 11:33:24AM -0700, ZAK Magnus wrote:
> Okay... So this is a problem we need to solve. Does there exist a good
> way to output a stack trace to, say, a file in /proc? I think that
> would be an appealing solution, if doable.

One idea I thought of to workaround this is to save the timestamp and the
watchdog bool and restore after the stack dump.  It's a cheap hack and I
am not to sure about the locking as it might race with
touch_nmi_watchdog().  But it gives you an idea what I was thinking.

Being in the nmi context, no one can normally touch these variables,
except for another cpu using touch_nmi_watchdog() (or watchdog_enable()
but that should never race in these scenarios).

Cheers,
Don

compile tested only.


diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 17bcded..2dcedb3 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -214,6 +214,9 @@ void touch_softlockup_watchdog_sync(void)
 static void update_hardstall(unsigned long stall, int this_cpu)
 {
 	int update_stall = 0;
+	int ts;
+	bool touched;
+
 	if (stall > hardstall_thresh &&
 			stall > worst_hardstall + hardstall_diff_thresh) {
 		unsigned long flags;
@@ -225,10 +228,14 @@ static void update_hardstall(unsigned long stall, int this_cpu)
 	}
 
 	if (update_stall) {
+		ts = __this_cpu_read(watchdog_touch_ts);
+		touched = __this_cpu_read(watchdog_nmi_touch);
 		printk(KERN_WARNING "LOCKUP may be in progress!"
 			"Worst hard stall seen on CPU#%d: %lums\n",
 			this_cpu, stall);
 		dump_stack();
+		__this_cpu_write(watchdog_touch_ts, ts);
+		__this_cpu_write(watchdog_nmi_touch, touched);
 	}
 }
 
@@ -262,6 +269,9 @@ static int is_hardlockup(int this_cpu)
 static void update_softstall(unsigned long stall, int this_cpu)
 {
 	int update_stall = 0;
+	int ts;
+	bool touched;
+
 	if (stall > get_softstall_thresh() &&
 			stall > worst_softstall + softstall_diff_thresh) {
 		unsigned long flags;
@@ -273,10 +283,14 @@ static void update_softstall(unsigned long stall, int this_cpu)
 	}
 
 	if (update_stall) {
+		ts = __this_cpu_read(watchdog_touch_ts);
+		touched = __this_cpu_read(watchdog_nmi_touch);
 		printk(KERN_WARNING "LOCKUP may be in progress!"
 				"Worst soft stall seen on CPU#%d: %lums\n",
 				this_cpu, stall);
 		dump_stack();
+		__this_cpu_write(watchdog_touch_ts, ts);
+		__this_cpu_write(watchdog_nmi_touch, touched);
 	}
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ