linux-kernel - Re: [PATCH 1/5] HWPOISON: define VM_FAULT_HWPOISON to 0 when feature is disabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4A327CB1.6060009@redhat.com>
Date:	Fri, 12 Jun 2009 12:05:05 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Wu Fengguang <fengguang.wu@...el.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Nick Piggin <npiggin@...e.de>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	Andi Kleen <andi@...stfloor.org>,
	"chris.mason@...cle.com" <chris.mason@...cle.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH 1/5] HWPOISON: define VM_FAULT_HWPOISON to 0 when	feature
 is disabled

Ingo Molnar wrote:

> So i think hwpoison simply does not affect our ability to get log 
> messages out - but it sure allows crappier hardware to be used.
> Am i wrong about that for some reason?

You are :)

A 2-bit memory error can be a temporary failure, eg.
due to a cosmic ray.  If bit errors could be prevented
in hardware, there would be no reason to have ECC at all.

The only reason to stop using that page is because we
do not know for sure whether the error was temporary
or permanent (or dependent on a particular bit pattern).

Userspace needs to be notified that some data disappeared,
if it did - for clean pagecache and swap cache pages, the
kernel can simply take the page away and wait for a page
fault...

The sysadmin needs to know that something happened too,
because the hardware *might* have a problem.

However, a 2-bit error does not imply that the hardware
actually needs to be replaced.

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/