linux-kernel - Re: [PATCH] [13/16] POISON: The high level memory error handler in the VM II

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090429090501.GB15488@localhost>
Date:	Wed, 29 Apr 2009 17:05:01 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	Chris Mason <chris.mason@...cle.com>,
	"hugh@...itas.com" <hugh@...itas.com>,
	"npiggin@...e.de" <npiggin@...e.de>,
	"riel@...hat.com" <riel@...hat.com>,
	"lee.schermerhorn@...com" <lee.schermerhorn@...com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH] [13/16] POISON: The high level memory error handler in
	the VM II

On Wed, Apr 29, 2009 at 04:36:55PM +0800, Andi Kleen wrote:
> > > I'll have to read harder next week, the FS invalidatepage may expect
> > > truncate to be the only caller.
> > 
> > If direct de-dirty is hard for some pages, how about just ignore them?
> 
> You mean just ignoring it for the pages where it is hard?

Yes.

> Yes that is what it is essentially doing right now. But at least
> some dirty pages need to be handled because most user space
> pages tend to be dirty.

Sure.  There are three types of dirty pages:

A. now dirty, can be de-dirty in the current code
B. now dirty, cannot be de-dirty
C. now dirty and writeback, cannot be de-dirty

I mean B and C can be handled in one single place - the block layer.

If B is hard to be de-dirtied now, ignore them for now and they will
eventually be going to IO and become C.

> > There are the PG_writeback pages anyway. We can inject code to
> > intercept them at the last stage of IO request dispatching.
> 
> That would require adding error out code through all the file systems,
> right?

Not necessarily. The file systems deal with buffer head, extend map
and bios, they normally won't touch the poisoned page content at all.

So it's mostly safe to add one single door-keeper at the low level
request dispatch queue.

> > 
> > Some perceivable problems and solutions are
> > 1) the intercepting overheads could be costly => inject code at runtime.
> > 2) there are cases that the dirty page could be copied for IO:
> 
> At some point we should probably add poison checks before these operations

Maybe some ext4 developers can drop us more hint one these two cases.
We can also do some instruments to see how often (2.1.x) will happen.

But I guess a simple PagePoison() test is cheap anyway.

> yes. At least for read it should be the same code path as EIO --
> you have to check PG_error anyways  (or at least you ought to)
> The main difference is that for write you have to check it too.

Check which on write? You mean Copy-out?

Another copy path is the bounced read/write... I guess it won't be
common in 64bit system though.

> >    2.1) jbd2 has two copy-out cases => should be rare. just ignore them?
> >      2.1.1) do_get_write_access(): buffer sits in two active commits
> >      2.1.2) jbd2_journal_write_metadata_buffer(): buffer happens to start
> >             with JBD2_MAGIC_NUMBER
> >    2.2) btrfs have to read page for compress/encryption
> >      Chris: is btrfs_zlib_compress_pages() a good place for detecting
> >      poison pages? Or is it necessary at all for btrfs?(ie. it's
> >      already relatively easy to de-dirty btrfs pages.)
> 
> I think btrfs' IO error handling is not very great right now. But once
> it matures i hope poison pages can be handled in the same way as
> regular IO errors.

OK.

> >    2.3) maybe more cases...
> 
> Undoubtedly. Goal is just to handle the common cases that cover a lot 
> of memory. This will never be 100%.

Right. We'll discover/cover more cases as time goes by.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/