lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090429081616.GA8339@localhost>
Date:	Wed, 29 Apr 2009 16:16:16 +0800
From:	Wu Fengguang <fengguang.wu@...el.com>
To:	Chris Mason <chris.mason@...cle.com>
Cc:	Andi Kleen <andi@...stfloor.org>, hugh@...itas.com,
	npiggin@...e.de, riel@...hat.com, lee.schermerhorn@...com,
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, x86@...nel.org
Subject: Re: [PATCH] [13/16] POISON: The high level memory error handler in
	the VM II

On Thu, Apr 09, 2009 at 10:37:39AM -0400, Chris Mason wrote:
> On Thu, 2009-04-09 at 16:02 +0200, Andi Kleen wrote:
> > On Thu, Apr 09, 2009 at 09:30:29AM -0400, Chris Mason wrote:
> > > > Is that a correct assumption?
> > > 
> > > Yes, the page won't become writeback when you're holding the page lock.
> > > But, the FS usually thinks of try_to_releasepage as a polite request.
> > > It might fail internally for a bunch of reasons.
> > > 
> > > To make things even more fun, the page won't become writeback magically,
> > > but ext3 and reiser maintain lists of buffer heads for data=ordered, and
> > > they do the data=ordered IO on the buffer heads directly.  writepage is
> > > never called and the page lock is never taken, but the buffer heads go
> > > to disk.  I don't think any of the other filesystems do it this way.
> > 
> > Ok, so do you think my code handles this correctly?
> 
> Even though try_to_releasepage only checks page_writeback() the lower
> filesystems all bail on dirty pages or dirty buffers (see the checks
> done by try_to_free_buffers).
> 
> It looks like the only way we have to clean a page and all the buffers
> in it is the invalidatepage call.  But that doesn't return success or
> failure, so maybe invalidatepage followed by releasepage?
> 
> I'll have to read harder next week, the FS invalidatepage may expect
> truncate to be the only caller.

If direct de-dirty is hard for some pages, how about just ignore them?
There are the PG_writeback pages anyway. We can inject code to
intercept them at the last stage of IO request dispatching.

Some perceivable problems and solutions are
1) the intercepting overheads could be costly => inject code at runtime.
2) there are cases that the dirty page could be copied for IO:
   2.1) jbd2 has two copy-out cases => should be rare. just ignore them?
     2.1.1) do_get_write_access(): buffer sits in two active commits
     2.1.2) jbd2_journal_write_metadata_buffer(): buffer happens to start
            with JBD2_MAGIC_NUMBER
   2.2) btrfs have to read page for compress/encryption
     Chris: is btrfs_zlib_compress_pages() a good place for detecting
     poison pages? Or is it necessary at all for btrfs?(ie. it's
     already relatively easy to de-dirty btrfs pages.)
   2.3) maybe more cases...

> > 
> > > If we really want the page gone, we'll have to tell the FS
> > > drop-this-or-else....sorry, its some ugly stuff.
> > 
> > I would like to give a very strong hint at least. If it fails
> > we can still ignore it, but it will likely have negative consequences later.
> > 
> 
> Nod.
> 
> > > 
> > > The good news is, it is pretty rare.  I wouldn't hold up the whole patch
> > 
> > You mean pages with Private bit are rare? Are you suggesting to just
> > ignore those? How common is it to have Private pages which are not
> > locked by someone else?
> > 
> 
> PagePrivate is very common.  try_to_releasepage failing on a clean page
> without the writeback bit set and without dirty/locked buffers will be
> pretty rare.

Yup. btrfs seems to tag most(if not all) dirty pages with PG_private.
While ext4 won't.

Thanks,
Fengguang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ