lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 12 Aug 2009 17:39:35 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Nick Piggin <npiggin@...e.de>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>,
	"tytso@....edu" <tytso@....edu>,
	"hch@...radead.org" <hch@...radead.org>,
	"mfasheh@...e.com" <mfasheh@...e.com>,
	"aia21@...tab.net" <aia21@...tab.net>,
	"hugh.dickins@...cali.co.uk" <hugh.dickins@...cali.co.uk>,
	"swhiteho@...hat.com" <swhiteho@...hat.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Satoshi OSHIMA <satoshi.oshima.fk@...achi.com>,
	Taketoshi Sakuraba <taketoshi.sakuraba.hc@...achi.com>
Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for
	migration aware file systems

On Wed, Aug 12, 2009 at 05:05:18PM +0800, Nick Piggin wrote:
> On Wed, Aug 12, 2009 at 10:57:27AM +0200, Andi Kleen wrote:
> > On Wed, Aug 12, 2009 at 10:46:13AM +0200, Nick Piggin wrote:
> > > On Wed, Aug 12, 2009 at 10:23:31AM +0200, Andi Kleen wrote:
> > > > > page corruption, IMO, because by definition they should be able to
> > > > > tolerate panic. But if they do not know about this change to -EIO
> > > > > semantics, then it is quite possible to cause problems.
> > > > 
> > > > There's no change really. You already have this problem with
> > > > any metadata error, which can cause similar trouble.
> > > > If the application handles those correctly it will also 
> > > > handle hwpoison correctly.
> > > 
> > > What do you mean metadata error?
> > 
> > e.g. when there's an write error on the indirect block or any
> > other fs metadata. This can also cause you to lose data. The error 
> > reporting also works through the address space like with hwpoison,
> > so it only gets reported once.
> 
> Well, this is also a filesystem issue, but anyway the data typically
> does not get thrown out. So a subsequent fsync should be able to
> retry.

Right. In normal EIO, the data in page cache is still good and
accessible.

> But if the filesystem can't handle such errors and loses the original
> data when there is an IO error in newly dirty metadata, then it's
> a problem in the filesystem really isn't it?

Right, and the fs should report EIO on future sync attempts as long as
the problem sticks.

> > I'm not really against fixing that (make the error more sticky
> > as Fengguang puts it), but I don't think it needs to be mixed
> > with hwpoison.
> 
> I don't know if making it sticky realy "fixes" it. The problem is
> different semantics of what EIO means. My example illustrates this.

Case 1: (re)sync on EIO: sticky EIO will help.

Case 2: read out the data from page cache and rewrite it somewhere.
Sticky EIO is not enough, because here the application assumes the
dirty page is still accessible. In this case, patch
http://lkml.org/lkml/2009/6/11/294 will help. It effectively freezes
the radix tree, so that no new pages will be loaded to replace the 
corrupted data and fake a 'good' one.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ