linux-kernel - Re: [PATCH] watchdog: Inject NMI when locked up and going to panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20121119161917.229c6d6f.akpm@linux-foundation.org>
Date:	Mon, 19 Nov 2012 16:19:17 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Sasha Levin <sasha.levin@...cle.com>
Cc:	mingo@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] watchdog: Inject NMI when locked up and going to panic

On Sat, 17 Nov 2012 19:28:53 -0500
Sasha Levin <sasha.levin@...cle.com> wrote:

> Send an NMI to all CPUs when a lockup is detected and the lockup
> watchdog code is configured to panic. This gives us a fairly uptodate
> snapshot of all CPUs in the system.
> 
> This lets us get stack trace of all CPUs which makes life easier
> trying to debug a deadlock, and the NMI doesn't change anything
> since the next step is a kernel panic.
> 

nit: I'll rename this to "watchdog: trigger all-cpu backtrace when
locked up and going to panic".  We don't know how the arch implements
trigger_all_cpu_backtrace() at this level!


> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -239,10 +239,12 @@ static void watchdog_overflow_callback(struct perf_event *event,
>  		if (__this_cpu_read(hard_watchdog_warn) == true)
>  			return;
>  
> -		if (hardlockup_panic)
> +		if (hardlockup_panic) {
> +			trigger_all_cpu_backtrace();
>  			panic("Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> -		else
> +		} else {
>  			WARN(1, "Watchdog detected hard LOCKUP on cpu %d", this_cpu);
> +		}
>  
>  		__this_cpu_write(hard_watchdog_warn, true);
>  		return;
> @@ -323,8 +325,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  		else
>  			dump_stack();
>  
> -		if (softlockup_panic)
> +		if (softlockup_panic) {
> +			trigger_all_cpu_backtrace();
>  			panic("softlockup: hung tasks");
> +		}
>  		__this_cpu_write(soft_watchdog_warn, true);
>  	} else
>  		__this_cpu_write(soft_watchdog_warn, false);

The change seems sensible, but I wonder about CONFIG_SMP=n machines. 
Will they end up getting the same backtrace displayed twice?

(I don't remember whether trigger_all_cpu_backtrace() is really
trigger_all_other_cpu_backtrace() and we didn't document it).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/