linux-kernel - set_page_dirty races (was: Re: [patch 2/4] vfs: add set_page_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1234867200.4744.65.camel@laptop>
Date:	Tue, 17 Feb 2009 11:40:00 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Nick Piggin <npiggin@...e.de>
Cc:	Edward Shishkin <edward.shishkin@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ryan Hope <rmh3093@...il.com>,
	Randy Dunlap <randy.dunlap@...cle.com>,
	linux-kernel@...r.kernel.org,
	ReiserFS Mailing List <reiserfs-devel@...r.kernel.org>
Subject: set_page_dirty races (was: Re: [patch 2/4] vfs: add
 set_page_dirty_notag)

On Tue, 2009-02-17 at 11:24 +0100, Nick Piggin wrote:
> On Tue, Feb 17, 2009 at 11:05:16AM +0100, Peter Zijlstra wrote:
> > On Tue, 2009-02-17 at 10:38 +0100, Nick Piggin wrote:
> > 
> > > It is a great shame that filesystems are not properly notified
> > > that a page may become dirty before the actual set_page_dirty
> > > event (which is not allowed to fail and is called after the
> > > page is already dirty).
> > 
> > Not quite true, for example the set_page_dirty() done by the write fault
> > code is done _before_ the page becomes dirty.
> > 
> > This before/after thing was the reason for that horrid file corruption
> > bug that dragged on for a few weeks back in .19 (IIRC).
> 
> Yeah, there are actually races though. The page can become cleaned
> before set_page_dirty is reached, and there are also nasty races with
> truncate.

Hmm, so you're saying that never got properly fixed?
 
> > > This is a big problem I have with fsblock simply in trying to
> > > make the memory allocation robust. page_mkwrite unfortunately
> > > is racy and I've fixed problems there... the big problem though
> > > is get_user_pages. Fixing that properly seems to require fixing
> > > callers so it is not really realistic in the short term.
> > 
> > Right, I'm just not sure what we can do, even with a
> > prepage_page_dirty() function, what are you going to do, fail the fault?
> 
> Oh, for regular page fault functions using page_mkwrite, they
> definitely want to fail the fault with a SIGBUS, and actually XFS
> already does that (for fsblock robust memory allocations you
> would also want to fail OOM on metadata allocation failure). What
> is the other option? Silently fail the write?

OK, agreed.

> For XFS purpose (ie. -ENOSPC handling), the current code is reasonable
> although there could be some truncate races with block allocation. But
> mostly probably works. For something like fsblock it can be much more
> common to have the metadata refcount reach 0 and freed before spd is
> called. In that case the code actually goes into a bug situation so it
> is a bit more critical.
> 
> But no that's the "easy" part. The hard part is get_user_pages
> because the caller can hold onto the page indefinitely simply with a
> refcount, and go along happily dirtying it at any stage (actually
> writing to the page memory) before actually calling set_page_dirty.

Should a gup user not specify .write=1 if it wants to dirty the page, at
which point the follow_page() will do the dirty-fault thingy.

Ah, but then we can clean it because we're not holding the page-lock. I
see.

> The "cleanest" way to fix this from VM point of view is probably to
> force gup callers to hold the page locked for the duration to
> prevent truncation or writeout after the filesystem notification.
> Don't know if that would be very popular, however.

Right, so you'd want to keep the page locked over gup(.write=1)
sections.

So should we extend the gup() with put_user_page()?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/