lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 20 Dec 2006 17:20:09 -0800
From:	Andrew Morton <akpm@...l.org>
To:	Linus Torvalds <torvalds@...l.org>
Cc:	Martin Michlmayr <tbm@...ius.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Hugh Dickins <hugh@...itas.com>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Arjan van de Ven <arjan@...radead.org>,
	Andrei Popa <andrei.popa@...eo.ro>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Florian Weimer <fw@...eb.enyo.de>,
	Marc Haber <mh+linux-kernel@...schlus.de>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Arnd Bergmann <arnd.bergmann@...ibm.com>,
	gordonfarquharson@...il.com,
	"Chen, Kenneth W" <kenneth.w.chen@...el.com>,
	Christoph Lameter <clameter@...r.sgi.com>
Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content
 corruption on ext3)

On Wed, 20 Dec 2006 16:43:31 -0800 (PST)
Linus Torvalds <torvalds@...l.org> wrote:

> 
> 
> On Wed, 20 Dec 2006, Linus Torvalds wrote:
> > > 
> > > There's also redirty_page_for_writepage().
> > 
> > _dirtying_ a page makes sense in any situation. You can always dirty them. 
> > I'm just saying that you can't just mark them *clean*.
> > 
> > If your point was that the filesystem had better be able to take care of 
> > "redirty_page_for_writepage()", then yes, of course. But since it's the 
> > filesystem itself that does it, it had _better_ be able to take care of 
> > the situation it puts itself into.
> 
> Btw, as an example of something where this may NOT be ok, look at 
> migrate_page_copy().
> 
> I'm not at all convinced that "migrate_page_copy()" can work at all. It 
> does:
> 
> 	...
>         if (PageDirty(page)) {
>                 clear_page_dirty_for_io(page);
>                 set_page_dirty(newpage);

Note that this is referring to different pages.

>         }
> 	...
> 
> which is an example of what NOT to do, because it claims to clear the page 
> for IO, but doesn't actually _do_ any IO.
> 
> And this is wrong, for many reasons. 
>
> For example, it's very possible that the old page is not actually 
> up-to-date, and is only partially dirty using some FS-specific dirty data 
> queues (like NFS does with its dirty data, or buffer-heads can do for 
> local filesystems).

afaict the code copes with those things.

> When you do
> 
> 	if (clear_dirty(page))
> 		set_page_dirty(page);
> 
> in generic VM code, that is a BUG. It's an insane operation. It cannot 
> work. It's exactly what I'm trying to avoid.

These are different pages.

We could view the copy_highpage() in migrate_page_copy() as an "io"
operation, only the backing store is a new pagecache page.

It'd be more logical if that copy_highpage() was occurring after the
clear_page_dirty_for_io().

I'm not sure why migrate_page_copy() is playing with
PageWriteback(newpage).  Surely newpage is locked, in which case nobody
will be starting writeback on it.

> So page migration is probably broken, but it's no less broken than it 
> always has been. And I don't think many people use it anyway. It might 
> work "by accident" in a lot of situations, but to actually be solid, it 
> really would need to do something fundamentally different, like:
> 
>  - have a per-mapping "migrate()" function that actually knows HOW to 
>    migrate the dirty state from one page to another.

That is how it's presently implemented.  You're looking at helper functions
which fileystems may point their address_space_operations.migratepage at.

>  - or, preferably, by just not migrating dirty pages, and just actually 
>    doing the writeback on them first.
> 
> Again, this is an example of just _incorrect_ code, that thinks that it 
> can "just clear the dirty bit". You can't do that. It's wrong. And it is 
> not wrong just because I say so, but because the operations itself simply 
> is FUNDAMENTALLY not a sensible one.

The dirty state is being transferred to the new page.  The tricky part is
handling the cases where these pages are mapped into pagetables.  That's
what the special migration ptes are there for.  I'll let Christoph explain
that lot ;)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ