lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 12 Jun 2009 17:35:01 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Wu Fengguang <fengguang.wu@...el.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Nick Piggin <npiggin@...e.de>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	Andi Kleen <andi@...stfloor.org>,
	"riel@...hat.com" <riel@...hat.com>,
	"chris.mason@...cle.com" <chris.mason@...cle.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH 1/5] HWPOISON: define VM_FAULT_HWPOISON to 0 when
	feature is disabled


* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Fri, 12 Jun 2009, Ingo Molnar wrote:
> > 
> > This seems like trying to handle a failure mode that cannot be 
> > and shouldnt be 'handled' really. If there's an 'already 
> > corrupted' page then the box should go down hard and fast, and 
> > we should not risk _even more user data corruption_ by trying to 
> > 'continue' in the hope of having hit some 'harmless' user 
> > process that can be killed ...
> 
> No, the box should _not_ go down hard-and-fast. That's the last 
> thing we should *ever* do.
> 
> We need to log it. Often at a user level (ie we want to make sure 
> it actually hits syslog, possibly goes out the network, maybe pops 
> up a window, whatever).
> 
> Shutting down the machine is the last thing we ever want to do.
> 
> The whole "let's panic" mentality is a disease.

No doubt about that - and i'm removing BUG_ON()s and panic()s 
wherever i can and havent added a single new one myself in the past 
5 years or so, its a disease.

If a fault hits a harmless piece of the system, then the log message 
will make it out and people know what happened. hwpoison does not 
affect that at all. If the fault hits the critical path towards 
gettig the log message out - then we wont get a log message, 
hwpoison or not.

My point is that hwpoison allows the _ignoring_ of hardware problems 
and thus pushes more buggy hardware up the pipeline.

Clusters will be running with this under the (false IMO) assumption 
that the kernel will tell the admin when something bad happened and 
the machine can limp along otherwise.

So i think hwpoison simply does not affect our ability to get log 
messages out - but it sure allows crappier hardware to be used.
Am i wrong about that for some reason?

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ