linux-kernel - Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090602121940.GD1392@wotan.suse.de>
Date:	Tue, 2 Jun 2009 14:19:40 +0200
From:	Nick Piggin <npiggin@...e.de>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Andi Kleen <andi@...stfloor.org>,
	"hugh@...itas.com" <hugh@...itas.com>,
	"riel@...hat.com" <riel@...hat.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"chris.mason@...cle.com" <chris.mason@...cle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3

On Tue, Jun 02, 2009 at 07:14:07PM +0800, Wu Fengguang wrote:
> On Mon, Jun 01, 2009 at 10:40:51PM +0800, Nick Piggin wrote:
> > But you just said that you try to intercept the IO. So the underlying
> > data is not necessarily corrupt. And even if it was then what if it
> > was reinitialized to something else in the meantime (such as filesystem
> > metadata blocks?) You'd just be introducing worse possibilities for
> > coruption.
> 
> The IO interception will be based on PFN instead of file offset, so it
> won't affect innocent pages such as your example of reinitialized data.

OK, if you could intercept the IO so it never happens at all, yes
of course that could work.


> poisoned dirty page == corrupt data      => process shall be killed
> poisoned clean page == recoverable data  => process shall survive
> 
> In the case of dirty hwpoison page, if we reload the on disk old data
> and let application proceed with it, it may lead to *silent* data
> corruption/inconsistency, because the application will first see v2
> then v1, which is illogical and hence may mess up its internal data
> structure.

Right, but how do you prevent that? There is no way to reconstruct the
most updtodate data because it was destroyed.

 
> > You will need to demonstrate a *big* advantage before doing crazy things
> > with writeback ;)
> 
> OK. We can do two things about poisoned writeback pages:
> 
> 1) to stop IO for them, thus avoid corrupted data to hit disk and/or
>    trigger further machine checks

1b) At which point, you invoke the end-io handlers, and the page is
no longer writeback.

> 2) to isolate them from page cache, thus preventing possible
>    references in the writeback time window

And then this is possible because you aren't violating mm
assumptions due to 1b. This proceeds just as the existing
pagecache mce error handler case which exists now.

 
> > > Now it's obvious that reusing more code than truncate_complete_page()
> > > is not easy (or natural).
> > 
> > Just lock the page and wait for writeback, then do the truncate
> > work in another function. In your case if you've already unmapped
> > the page then it won't try to unmap again so no problem.
> > 
> > Truncating from pagecache does not change ->index so you can
> > move the loop logic out.
> 
> Right. So effectively the reusable function is exactly
> truncate_complete_page(). As I said this reuse is not a big gain.

Anyway, we don't have to argue about it. I already send a patch
because it was so hard to do, so let's move past this ;)


> > > Yes it's kind of insane.  I'm interested in reasoning it out though.

Well with the IO interception (I missed this point), then it seems
maybe no longer so insane. We could see how it looks.


> > I guess it is a good idea to start simple.
> 
> Agreed.
> 
> > Considering that there are so many other types of pages that are
> > impossible to deal with or have holes, then I very strongly doubt
> > it will be worth so much complexity for closing the gap from 90%
> > to 90.1%. But we'll see.
> 
> Yes, the plan is to first focus on the more important cases.

Great.

Thanks,
Nick

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/