lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20150722155436.04d66934cd423107b810f2b1@linux-foundation.org>
Date:	Wed, 22 Jul 2015 15:54:36 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Spencer Baugh <sbaugh@...ern.com>
Cc:	Don Zickus <dzickus@...hat.com>,
	Ulrich Obergfell <uobergfe@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	Andrew Jones <drjones@...hat.com>,
	chai wen <chaiw.fnst@...fujitsu.com>,
	Chris Metcalf <cmetcalf@...hip.com>,
	Stephane Eranian <eranian@...gle.com>,
	linux-kernel@...r.kernel.org (open list),
	Joern Engel <joern@...estorage.com>,
	Spencer Baugh <Spencer.baugh@...estorage.com>,
	Joern Engel <joern@...fs.org>
Subject: Re: [PATCH] soft lockup: kill realtime threads before panic

On Tue, 21 Jul 2015 15:07:57 -0700 Spencer Baugh <sbaugh@...ern.com> wrote:

> From: Joern Engel <joern@...fs.org>
> 
> We have observed cases where the soft lockup detector triggered, but no
> kernel bug existed.  Instead we had a buggy realtime thread that
> monopolized a cpu.  So let's kill the responsible party and not panic
> the entire system.
> 
> ...
>
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -428,7 +428,10 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  		}
>  
>  		add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
> -		if (softlockup_panic)
> +		if (rt_prio(current->prio)) {
> +			pr_emerg("killing realtime thread\n");
> +			send_sig(SIGILL, current, 0);

Why choose SIGILL?

> +		} else if (softlockup_panic)
>  			panic("softlockup: hung tasks");
>  		__this_cpu_write(soft_watchdog_warn, true);

But what about a non-buggy realtime thread which happens to
occasionally spend 15 seconds doing stuff?

Old behaviour: kernel blurts a softlockup message, everything keeps running.

New behaviour: thread gets killed, plane crashes.


Possibly a better approach would be to only kill the thread if
softlockup_panic was set, because the system is going down anyway.

Also, perhaps some users would prefer that the kernel simply suppress
the softlockup warning in this situation, rather than killing stuff!




Really, what you're trying to implement here is a watchdog for runaway
realtime threads.  And that sounds a worthy project but it's a rather
separate thing from the softlockup detector.  A realtime thread
watchdog feature might have things as

- timeout duration separately configurable from softlockup

- enabled independently from sotflockup: people might want one and
  not the other.

- configurable signal, perhaps?

Now, the *implementation* of the realtime thread watchdog may well
share code with the softlockup detector.  But from a
conceptual/configuration/documentation point of view, it's a separate
thing, no?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ