linux-kernel - Re: [PATCH] [16/19] HWPOISON: Enable .remove_error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090810070745.GA26533@localhost>
Date:	Mon, 10 Aug 2009 15:07:45 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Hidehiro Kawai <hidehiro.kawai.ez@...achi.com>
Cc:	Andi Kleen <andi@...stfloor.org>, "tytso@....edu" <tytso@....edu>,
	"hch@...radead.org" <hch@...radead.org>,
	"mfasheh@...e.com" <mfasheh@...e.com>,
	"aia21@...tab.net" <aia21@...tab.net>,
	"hugh.dickins@...cali.co.uk" <hugh.dickins@...cali.co.uk>,
	"swhiteho@...hat.com" <swhiteho@...hat.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"npiggin@...e.de" <npiggin@...e.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH] [16/19] HWPOISON: Enable .remove_error_page for
	migration    aware file systems

Hi Hidehiro,

On Mon, Aug 10, 2009 at 02:36:01PM +0800, Hidehiro Kawai wrote:
> Hi,
> 
> Andi Kleen wrote:
> 
> > Index: linux/fs/ext3/inode.c
> > ===================================================================
> > --- linux.orig/fs/ext3/inode.c
> > +++ linux/fs/ext3/inode.c
> > @@ -1819,6 +1819,7 @@ static const struct address_space_operat
> >  	.direct_IO		= ext3_direct_IO,
> >  	.migratepage		= buffer_migrate_page,
> >  	.is_partially_uptodate  = block_is_partially_uptodate,
> > +	.error_remove_page	= generic_error_remove_page,
> >  };
> 
> (I'm sorry if I'm missing the point.)
> 
> If my understanding is correct, the following scenario can happen:
> 
> 1. An uncorrected error on a dirty page cache page is detected by
>    memory scrubbing
> 2. Kernel unmaps and truncates the page to recover from the error
> 3. An application reads data from the file location corresponding
>    to the truncated page
>    ==> Old or garbage data will be read into a new page cache page
> 4. The application modifies the data and write back it to the disk
> 5. The file will corrurpt!
> 
> (Yes, the application is wrong to not do the right thing, i.e. fsync,
>  but it's not user's fault!)

Right. Note that the data has already been corrupted and the above
scenario can be called as re-corruption. We set AS_EIO to trigger some
IO reporting mechanism so that it won't corrupt *silently*.

> A similar data corruption can be caused by a write I/O error,
> because dirty flag is cleared even if the page couldn't be written
> to the disk.

Yes.

> However, we have a way to avoid this kind of data corruption at
> least for ext3.  If we mount an ext3 filesystem with data=ordered
> and data_err=abort, all I/O errors on file data block belonging to
> the committing transaction are checked.  When I/O error is found,
> abort journaling and remount the filesystem with read-only to
> prevent further updates.  This kind of feature is very important
> for mission critical systems.

Agreed. We also set PG_error, which should be enough to trigger such
remount?

> If we merge this patch, we would face the data corruption problem
> again.
> 
> I think there are three options,
> 
> (1) drop this patch
> (2) merge this patch with new panic_on_dirty_page_cache_corruption
>     sysctl
> (3) implement a more sophisticated error_remove_page function

In fact we proposed a patch for preventing the re-corruption case, see

        http://lkml.org/lkml/2009/6/11/294

However it is hard to answer the (policy) question "How sticky should
the EIO bit remain?".

> >  static const struct address_space_operations ext3_writeback_aops = {
> > @@ -1834,6 +1835,7 @@ static const struct address_space_operat
> >  	.direct_IO		= ext3_direct_IO,
> >  	.migratepage		= buffer_migrate_page,
> >  	.is_partially_uptodate  = block_is_partially_uptodate,
> > +	.error_remove_page	= generic_error_remove_page,
> >  };
> 
> The writeback case would be OK. It's not much different from the I/O
> error case.
> 
> >  static const struct address_space_operations ext3_journalled_aops = {
> > @@ -1848,6 +1850,7 @@ static const struct address_space_operat
> >  	.invalidatepage		= ext3_invalidatepage,
> >  	.releasepage		= ext3_releasepage,
> >  	.is_partially_uptodate  = block_is_partially_uptodate,
> > +	.error_remove_page	= generic_error_remove_page,
> >  };
> >  
> >  void ext3_set_aops(struct inode *inode)
> 
> I'm not sure about the journalled case.  I'm going to take a look at
> it later.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/