linux-kernel - Re: [feature] automatically detect hung TASK

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <p73y7ccwzlb.fsf@bingen.suse.de>
Date:	Sun, 02 Dec 2007 22:34:08 +0100
From:	Andi Kleen <andi@...stfloor.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Arjan van de Ven <arjan@...radead.org>,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [feature] automatically detect hung TASK_UNINTERRUPTIBLE tasks

Ingo Molnar <mingo@...e.hu> writes:
>
> do you realize that more than 120 seconds TASK_UNINTERRUPTIBLE _is_ 
> something that most humans consider as "buggy" in the overwhelming 
> majority of cases, regardless of the reason? Yes, there are and will be 
> some exceptions, but not nearly as countless as you try to paint it. A 
> quick test in the next -mm will give us a good idea about the ratio of 
> false positives.

That would assume error paths get regularly exercised in -mm. 
Doubtful.  Most likely we'll only hear about it after it's
out in the wild on some bigger release.

The problem I have with your patch is that it will mess up Linux (in
particular block/network file system) error handling even more than it
already is. In error handling cases such "unusual" things happen
frequently unfortunately.

I used to fight with this with the NMI watchdog on on x86-64 -- it
tended to trigger regularly on SCSI error handlers for example
disabling interrupts too long while handling the error. They
eventually got all fixed, but with that change they will likely
all start throwing nasty messages again. 

And usually it is not simply broken code neither but really
doing something difficult.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/