linux-kernel - Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090602125538.GH1065@one.firstfloor.org>
Date:	Tue, 2 Jun 2009 14:55:38 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	Nick Piggin <npiggin@...e.de>
Cc:	Andi Kleen <andi@...stfloor.org>, hugh@...itas.com,
	riel@...hat.com, akpm@...ux-foundation.org, chris.mason@...cle.com,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	fengguang.wu@...el.com
Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3

On Tue, Jun 02, 2009 at 02:37:20PM +0200, Nick Piggin wrote:
> Because I don't see any difference (see my previous patch). I
> still don't know what it is supposed to be doing differently.
> So if you reinvent your own that looks close enough to truncate
> to warrant a comment to say /* this is close to truncate but
> not quite */, then yes I insist that you say exactly why it is
> not quite like truncate ;)

I will just delete that comment because it apparently causes so 
much confusion.

> 
>  
> > > I'm suggesting that EIO is traditionally for when the data still
> > > dirty in pagecache and was not able to get back to backing
> > > store. Do you deny that?
> > 
> > Yes. That is exactly the case when memory-failure triggers EIO
> > 
> > Memory error on a dirty file mapped page.
> 
> But it is no longer dirty, and the problem was not that the data
> was unable to be written back.

Sorry I don't understand. What do you mean with "no longer dirty"

Of course it's still dirty, just has to be discarded because it's 
corrupted.

> > > And I think the application might try to handle the case of a
> > > page becoming corrupted differently. Do you deny that?
> > 
> > You mean a clean file-mapped page? In this case there is no EIO,
> > memory-failure just drops the page and it is reloaded.
> > 
> > If the page is dirty we trigger EIO which as you said above is the 
> > right reaction.
> 
> No I mean the difference between the case of dirty page unable to
> be written to backing sotre, and the case of dirty page becoming
> corrupted.

Nick, I have really a hard time following you here.

What exactly do you want? 

A new errno? Or something else? If yes what precisely?

I currently don't see any sane way to report this to the application
through write().  That is because adding a new errno for something
is incredibly hard and often impossible, and that's certainly
the case here.

The application can detect it if it maps the 
shared page and waits for a SIGBUS, but not through write().

But I doubt there will be really any apps that do anything differently
here anyways. A clever app could retry a few times if it still
has a copy of the data, but that might even make sense on normal
IO errors (e.g. on a SAN).

> 
> 
> > > OK, given the range of errors that APIs are defined to return,
> > > then maybe EIO is the best option. I don't suppose it is possible
> > > to expand them to return something else?
> > 
> > Expand the syscalls to return other errnos on specific
> > kinds of IO error?
> >  
> > Of course that's possible, but it has the problem that you 
> > would need to fix all the applications that expect EIO for
> > IO error. The later I consider infeasible.
> 
> They would presumably exit or do some default thing, which I
> think would be fine.

No it's not fine if they would handle EIO. e.g. consider
a sophisticated database which likely has sophisticated
IO error mechanisms too (e.g. only abort the current commit)

-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/