lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071202204725.GA25891@one.firstfloor.org>
Date:	Sun, 2 Dec 2007 21:47:25 +0100
From:	Andi Kleen <andi@...stfloor.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Arjan van de Ven <arjan@...radead.org>,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

> Out of direct experience, 95% of the "too long delay" cases are plain 
> old bugs. The rest we can (and must!) convert to TASK_KILLABLE or could 

I already pointed out a few cases (nfs, cifs, smbfs, ncpfs, afs).   
It would be pretty bad to merge this patch without converting them to 
TASK_KILLABLE first

There's also the additional issue that even block devices are often
network or SAN backed these days. Having 120 second delays in there
is quite possible.

So most likely adding this patch and still keeping a robust kernel
would require converting most of these delays to TASK_KILLABLE first.
That would not be a bad thing -- i would often like to kill a
process stuck on a bad block device -- but is likely a lot of work.

> There are no softlockup false positive bugs open at the moment. If you 
> know about any, then please do not hesitate and report them, i'll be 
> eager to fix them. The softlockup detector is turned on by default in 
> Fedora (alongside lockdep in rawhide), and it helped us find countless 

That just means nobody runs stress tests on those. e.g. lockdep 
tends to explode even on simple stress tests on larger systems because it
tracks all locks in all dynamic objects in memory and towards 6k-10k entries
the graph walks tend to take multiple seconds on some NUMA systems.

-Andi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ