lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 21 Jul 2015 23:33:23 -0700
From:	Jörn Engel <joern@...estorage.com>
To:	Mike Galbraith <umgwanakikbuti@...il.com>
Cc:	Spencer Baugh <sbaugh@...ern.com>, Don Zickus <dzickus@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ulrich Obergfell <uobergfe@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	Andrew Jones <drjones@...hat.com>,
	chai wen <chaiw.fnst@...fujitsu.com>,
	Chris Metcalf <cmetcalf@...hip.com>,
	Stephane Eranian <eranian@...gle.com>,
	open list <linux-kernel@...r.kernel.org>,
	Spencer Baugh <Spencer.baugh@...estorage.com>,
	Joern Engel <joern@...fs.org>
Subject: Re: [PATCH] soft lockup: kill realtime threads before panic

On Wed, Jul 22, 2015 at 07:41:48AM +0200, Mike Galbraith wrote:
> On Tue, 2015-07-21 at 22:18 -0700, Jörn Engel wrote:
> > 
> > Not sure if this patch is something for mainline, but those two
> > alternatives have problems of their own.  Not panicking on lockups can
> > leave a system disabled until some human come around.  In many cases
> > that human will do no better than power-cycle.  A panic reduces the
> > downtime.
> 
> If a realtime task goes bonkers, the realtime game is over, you're down.

Agreed.  But a reboot will often solve the issue.  So the automatic
panic will repair the system within minutes, while no panic will leave
the system broken for days, depending on human response time.  Automatic
panic is a great way to minimize downtime - or vulnerable time if you
have HA.

One could argue that killing the realtime thread is even better than
panic, as things can restart with a blank slate even faster.  But the
real benefit is that we get better debug data for the failing component.
If we had a kernel bug, the backtrace would usually be sufficient to
point fingers.  With a bonkers realtime thread, not so much.

Anyway, this patch has been useful to us once.  If someone deems it
merge-worthy, great.  If not, I won't lose any sleep either.

Jörn

--
The key to performance is elegance, not battalions of special cases.
-- Jon Bentley and Doug McIlroy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ