lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110517071642.GF22305@elte.hu>
Date:	Tue, 17 May 2011 09:16:42 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Mandeep Singh Baines <msb@...omium.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org,
	Marcin Slusarz <marcin.slusarz@...il.com>,
	Don Zickus <dzickus@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: [PATCH 4/4] watchdog: configure nmi watchdog period based on
 watchdog_thresh


* Mandeep Singh Baines <msb@...omium.org> wrote:

> Before the conversion of the NMI watchdog to perf event, the watchdog
> timeout was 5 seconds. Now it is 60 seconds. For my particular application,
> netbooks, 5 seconds was a better timeout. With a short timeout, we
> catch faults earlier and are able to send back a panic. With a 60 second
> timeout, the user is unlikely to wait and will instead hit the power
> button, causing us to lose the panic info.

That's an interesting observation. Have you been able to measure/observe this 
effect somehow, or do you presume that users find 60 seconds too long?

This would be a concern for upstream as well i guess.

> This change configures the NMI period based on the watchdog_thresh.

Hm, our tolerance for the two thresholds is not just human but technical: hard 
lockup warnings should indeed be triggered after just a few seconds, soft 
lockups can have false positives under extreme conditions.

So we generally want a higher threshold for soft lockups than for hard lockups.

So how about we couple the thresholds with a factor: we make the soft threshold 
twice the amount of time the hard threshold is? Then we could change the 
upstream default as well i think: lets change the NMI timeout to 10 seconds 
(and thus have the soft threshold at 20 seconds). Is 20 seconds short enough 
for most users to not hit reset?

We might want to change another aspect of the NMI watchdog: right now it tries 
to abort the offending task - which is really nasty if there was a spuriously 
long irqs-off section somewhere in the kernel. How about we just print a 
warning instead?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ