lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090615161904.GH31969@one.firstfloor.org>
Date:	Mon, 15 Jun 2009 18:19:04 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	Wu Fengguang <fengguang.wu@...el.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	Mel Gorman <mel@....ul.ie>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Nick Piggin <npiggin@...e.de>,
	"riel@...hat.com" <riel@...hat.com>,
	"chris.mason@...cle.com" <chris.mason@...cle.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH 00/22] HWPOISON: Intro (v5)

On Mon, Jun 15, 2009 at 04:28:04PM +0100, Alan "zSeries" Cox wrote:

> 	curse a lot
> 	suspend to disk
> 	remove dirt from fans, clean/replace RAM
> 	resume from disk
> 
> The very act of making the ECC error not take out the box creates the

Ok so at least you agree now that handling these errors without 
panic is the right thing to do. That's at least some progress.

> environment whereby the underlying hardware error (if there was one) can
> be cured.

These ECC errors are still somewhat rare (or rather if they become
common you should definitely service the system). That is why
losing a single page of memory for them isn't a big issue normally.

Sure you could spend effort making unpoisioning work,
but it would seem very dubious to me. After all it's just another
4K of memory for each error. 

The only reasonably good use case I heard for unpoisoning was
if you have a lot of huge pages (you can't use a full huge page with one bad 
small page), but that's also still relatively exotic.

-Andi

[1] mostly you need a new special form of RCU I think

-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ