linux-kernel - Re: [feature] automatically detect hung TASK

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20071203122833.GA20232@elte.hu>
Date:	Mon, 3 Dec 2007 13:28:33 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	Radoslaw Szkodzinski <lkml@...ralstorm.puszkin.org>,
	Arjan van de Ven <arjan@...radead.org>,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

* Andi Kleen <andi@...stfloor.org> wrote:

> On Mon, Dec 03, 2007 at 12:59:00PM +0100, Ingo Molnar wrote:
> > no. (that's why i added the '(or a kill -9)' qualification above - if 
> > NFS is mounted noninterruptible then standard signals (such as Ctrl-C) 
> > should not have an interrupting effect.)
> 
> NFS is already interruptible with umount -f (I use that all the 
> time...), but softlockup won't know that and throw the warning 
> anyways.

umount -f is a spectacularly unintelligent solution (it requires the 
user to know precisely which path to umount, etc.), TASK_KILLABLE is a 
lot more useful.

> > your syslet snide comment aside (which is quite incomprehensible - a
> 
> For the record I have no principle problem with syslets, just I do 
> consider them roughly equivalent in end result to a explicit retry 
> based AIO implementation.

which suggests you have not really understood syslets. Syslets have no 
"retry" component, they just process straight through the workflow. 
Retry based AIO has a retry component, which - as its name suggests 
already - retries operations instead of processing through the workload 
intelligently. Depending on how "deep" the context of an operation the 
retries might or might not make a noticeable difference in performance, 
but it sure is an inferior approach.

> > retry based asynchonous IO model is clearly inferior even if it were 
> > implemented everywhere), i do think that most if not all of these 
> > supposedly "difficult to fix" codepaths are just on the backburner 
> > out of lack of a clear blame vector.
> 
> Hmm. -ENOPARSE. Can you please clarify?

which bit was unclear to you? The retry bit i've explained above, lemme 
know if there's any other unclarity.

> > "audit thousands of callsites in 8 million lines of code first" is a 
> > nice euphemism for hiding from the blame forever. We had 10 years 
> > for it
> 
> Ok your approach is then to "let's warn about it and hope it will go 
> away"

s/hope//, but yes. Surprisingly, this works quite well :-) [as long as 
the warnings are not excessively bogus, of course]

and note that this is just a happy side-effect - the primary motivation 
is to get warnings about tasks that are uninterruptible forever. (which 
is a quite common kernel bug pattern.)

> Anyways I think I could live with it a one liner warning (if it's 
> seriously rate limited etc.) and a sysctl to enable the backtraces; 
> off by default. Or if you prefer that record the backtrace always in a 
> buffer and make it available somewhere in /proc or /sys or /debug. 
> Would that work for you?

you are over-designing it way too much - a backtrace is obviously very 
helpful and it must be printed by default. There's enough 
configurability in it already so that you can turn it off if you want. 
(And you said SLES has softlockup turned off already so it shouldnt 
affect you anyway.)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/