linux-kernel - Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090610121645.GC5657@localhost>
Date:	Wed, 10 Jun 2009 20:16:45 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Nick Piggin <npiggin@...e.de>
Cc:	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	Andi Kleen <andi@...stfloor.org>,
	"riel@...hat.com" <riel@...hat.com>,
	"chris.mason@...cle.com" <chris.mason@...cle.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler
	in the VM v5

On Wed, Jun 10, 2009 at 07:03:05PM +0800, Nick Piggin wrote:
> On Wed, Jun 10, 2009 at 05:20:11PM +0800, Wu Fengguang wrote:
> > On Wed, Jun 10, 2009 at 04:59:39PM +0800, Nick Piggin wrote:
> > > On Wed, Jun 10, 2009 at 04:38:03PM +0800, Wu Fengguang wrote:
> > > > On Wed, Jun 10, 2009 at 12:05:53AM +0800, Hugh Dickins wrote:
> > > > > I think a much more sensible approach would be to follow the page
> > > > > migration technique of replacing the page's ptes by a special swap-like
> > > > > entry, then do the killing from do_swap_page() if a process actually
> > > > > tries to access the page.
> > > > 
> > > > We call that "late kill" and will be enabled when
> > > > sysctl_memory_failure_early_kill=0. Its default value is 1.
> > > 
> > > What's the use of this? What are the tradeoffs, in what situations
> > > should an admin set this sysctl one way or the other?
> > 
> > Good questions.
> > 
> > My understanding is, when an application is generating data A, B, C in
> > sequence, and A is found to be corrupted by the kernel. Does it make
> > sense for the application to continue generate B and C? Or, are there
> > data dependencies between them? With late kill, it becomes more likely
> > that the disk contain new versions of B/C and old version of A, so
> > will more likely create data inconsistency.
> > 
> > So early kill is more safe.
> 
> Hmm, I think that's pretty speculative, and doesn't seem possible for
> an admin (or even kernel programmer) to choose the "right" value.
> 

Agreed. It's not easy to choose if I'm myself an admin ;)

> The application equally may not need to touch the data again, so
> killing it might cause some inconsistency in whatever it is currently
> doing.

Yes, early kill can also be evil. What I can do now is to document the
early kill parameter more carefully.

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/