linux-kernel - Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090601120537.GF5018@wotan.suse.de>
Date:	Mon, 1 Jun 2009 14:05:38 +0200
From:	Nick Piggin <npiggin@...e.de>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	hugh@...itas.com, riel@...hat.com, akpm@...ux-foundation.org,
	chris.mason@...cle.com, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, fengguang.wu@...el.com
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3

On Thu, May 28, 2009 at 03:45:20PM +0200, Andi Kleen wrote:
> On Thu, May 28, 2009 at 02:08:54PM +0200, Nick Piggin wrote:
> > Then the data can not have been consumed, by DMA or otherwise? What
> 
> When the data was consumed we get a different machine check
> (or a different error if it was consumed by a IO device)
> 
> This code right now just handles the case of "CPU detected a page is broken
> is wrong, but hasn't consumed it yet"

OK. Out of curiosity, how often do you expect to see uncorrectable ECC
errors?

 
> > about transient kernel references to the (pagecache/anonymous) page
> > (such as, find_get_page for read(2), or get_user_pages callers).	
> 
> There are always races, after all the CPU could be just about 
> right now to consume. If we lose the race there will be just
> another machine check that stops the consumption of bad data.
> The hardware takes care of that.
> 
> The code here doesn't try to be a 100% coverage of all
> cases (that's obviously impossible), just to handle
> common page types. I also originally had ideas for more handlers,
> but found out how hard it is to test, so I burried a lot of fancy
> ideas :-)
> 
> If there are left over references we complain at least.

Well I don't know about that, because you'd probably have races
between lookups and taking reference. But nevermind.

 
> > > > > +	/*
> > > > > +	 * remove_from_page_cache assumes (mapping && !mapped)
> > > > > +	 */
> > > > > +	if (page_mapping(p) && !page_mapped(p)) {
> > > > > +		remove_from_page_cache(p);
> > > > > +		page_cache_release(p);
> > > > > +	}
> > > > 
> > > > remove_mapping would probably be a better idea. Otherwise you can
> > > > probably introduce pagecache removal vs page fault races whi
> > > > will make the kernel bug.
> > > 
> > > Can you be more specific about the problems?
> > 
> > Hmm, actually now that we hold the page lock over __do_fault (at least
> > for pagecache pages), this may not be able to trigger the race I was
> > thinking of (page becoming mapped). But I think still it is better
> > to use remove_mapping which is the standard way to remove such a page.
> 
> I had this originally, but Fengguang redid it because there was
> trouble with the reference count. remove_mapping always expects it to
> be 2, which we cannot guarantee.

OK, but it should still definitely use truncate code.

> > > > > +	if (mapping) {
> > > > > +		/*
> > > > > +		 * Truncate does the same, but we're not quite the same
> > > > > +		 * as truncate. Needs more checking, but keep it for now.
> > > > > +		 */
> > > > 
> > > > What's different about truncate? It would be good to reuse as much as possible.
> > > 
> > > Truncating removes the block on disk (we don't). Truncating shrinks
> > > the end of the file (we don't). It's more "temporal hole punch"
> > > Probably from the VM point of view it's very similar, but it's
> > > not the same.
> > 
> > Right, I just mean the pagecache side of the truncate. So you should
> > use truncate_inode_pages_range here.
> 
> Why?  I remember I was trying to use that function very early on but
> there was some problem.  For once it does its own locking which
> would conflict with ours.

Just extract the part where it has the page locked into a common
function.


> Also we already do a lot of the stuff it does (like unmapping).

That's OK, it will not try to unmap again if it sees !page_mapped.
 

> Is there anything concretely wrong with the current code?


/*
 * Truncate does the same, but we're not quite the same
 * as truncate. Needs more checking, but keep it for now.
 */

I guess that it duplicates this tricky truncate code and also
says it is different (but AFAIKS it is doing exactly the same
thing).



> > Well, the dirty data has never been corrupted before (ie. the data
> > in pagecache has been OK). It was just unable to make it back to
> > backing store. So a program could retry the write/fsync/etc or
> > try to write the data somewhere else.
> 
> In theory it could, but in practice it is very unlikely it would.

very unlikely to try rewriting the page again or taking evasive
action to save the data somewhere else? I think that's a bold
assumption.

At the very least, having a prompt "IO error, check your hotplug
device / network connection / etc and try again" I don't think
sounds unreasonable at all.


> > It kind of wants a new error code, but I can't imagine the difficulty
> > in doing that...
> 
> I don't think it's a good idea to change programs for this normally,
> most wouldn't anyways. Even kernel programmers have trouble with
> memory suddenly going bad, for user space programmers it would
> be pure voodoo.
 
No, I mean as something to distinguish it between the  case of IO
device going bad (in which case the userspace program has a number
of things it might try to do). 


> > Just seems overengineered. We could rewrite any if/switch statement like
> > that (and actually the compiler probably will if it is beneficial).
> 
> The reason I like it is that it separates the functions cleanly,
> without that there would be a dispatcher from hell. Yes it's a bit
> ugly that there is a lot of manual bit checking around now too,
> but as you go into all the corner cases originally clean code
> always tends to get more ugly (and this is a really ugly problem)

Well... it is just writing another dispatcher from hell in a
different way really, isn't it? How is it so much better than
a simple switch or if/elseif/else statement?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/