linux-kernel - Re: VM: Fix nasty and subtle race in shared mmap'ed page writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0701051122040.3661@woody.osdl.org>
Date:	Fri, 5 Jan 2007 11:33:57 -0800 (PST)
From:	Linus Torvalds <torvalds@...l.org>
To:	Christoph Lameter <clameter@....com>
cc:	Nick Piggin <nickpiggin@...oo.com.au>,
	Andrea Gelmini <gelma@...ma.net>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: VM: Fix nasty and subtle race in shared mmap'ed page writeback

On Fri, 5 Jan 2007, Christoph Lameter wrote:
> 
> Maybe we should require taking the page lock before the dirty bits are 
> modified?

I think it's been suggested several times.

However, a lot of the code isn't really amenable to it as it stands now. 
We very much tend to call it in critical sections, and you have to move 
them all out of the locks they are now.

The dirty mmap patches actually did some of that, exactly because they 
wanted to replace "set_page_dirty()" with "set_page_dirty_balance()", 
which can obviously block. However, the _current_ kernels don't seem to 
have the bug, and quite frankly, I think it's at least partly because 
we've cleaned up and simplified the dirty bit handling even if Andrew 
apparently disagrees.

So one of the biggest need for dirty bit cleanup is with the _old_ 
kernels, the ones before the dirty mmap patches. Nobody is going to do 
that, I think - it would just be crazy.

> That way we can avoid all the artistic code in there right now. 

"Artistic". Good word. That said, most of the code in there needs its own 
locks for other reasons (ie the reason __set_page_dirty_nobuffers() ends 
up taking the tree-lock is because of the radix tree bits, which can NOT 
be protected by per-page locks _anyway_).

So I don't think you'd actually really get rid of any "artistic" code.

The real "artistry" is the fact that we actually do things in 
"set_page_dirty()" even if the page was marked dirty before, exactly 
because we need to re-dirty any meta-data (ie buffers). And THAT is what 
causes a lot of the strangeness. And again, that really doesn't have much 
to do with the page lock - the meta-data tends to generally have its own 
locks again, and it really is a _separate_ issue from the "dirty" bit 
itself.

So I agree: set_page_dirty() is ugly, but I don't think the real problem 
has anything to do with the page lock.

The real confusion comes from how the "dirty" bit means "_some_ part of 
this page may or may not be dirty", and that "set_page_dirty()" really not 
only needs to set that bit, but also make sure that we know that now ALL 
of this page is dirty. That particular confusion takes some time to 
assimilate, and the bug in 2.6.19 was due to missing the subtlety.

And that particular confusion we've had forever, and I think the bug in 
older kernels is probably due to some other place also having missed 
something subtle wrt that thing. For example, anything that 
tests/sets/clear just the dirty bit on its own is pretty much buggy by 
design, even if it _happens_ to work.

This is partly why I got rid of the old "[test_]clear_page_dirty()", 
because it just was confused about the whole thing, and seemed to think 
that it was just a simple bit. 

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/