linux-kernel - Re: 2.6.19 file content corruption on ext3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45861E68.3060403@yahoo.com.au>
Date:	Mon, 18 Dec 2006 15:51:52 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Linus Torvalds <torvalds@...l.org>
CC:	Andrew Morton <akpm@...l.org>, andrei.popa@...eo.ro,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Hugh Dickins <hugh@...itas.com>,
	Florian Weimer <fw@...eb.enyo.de>,
	Marc Haber <mh+linux-kernel@...schlus.de>,
	Martin Michlmayr <tbm@...ius.com>
Subject: Re: 2.6.19 file content corruption on ext3

Linus Torvalds wrote:
> [ Replying to myself - a sure sign that I don't get out enough ]
> 
> On Sun, 17 Dec 2006, Linus Torvalds wrote:
> 
>>So I don't actually see any serialization at all that would keep a random 
>>page from being paged back in.
> 
> 
> We do actually serialize, but we do it _after_ the page has already been 
> mapped back. Ie we do it for the dirty page case at rthe end of 
> do_wp_page() and do_no_page() when we do the "set_page_dirty_balance()", 
> but that's potentially too late - we've already mapped the page read-write 
> into the address space.

I can't see how that's exactly a problem -- so long as the page does not
get reclaimed (it won't, because we have a ref on it) then all that matters
is that the page eventually gets marked dirty.

> That said, this means that only threaded apps should ever trigger any 
> problems, which would seem to make it unlikely that this is the issue.
> 
> But Andrew: I don't think it's necessarily true that 
> "try_to_free_buffers()" callers have all unmapped the page.
> 
> That seems to be true for vmscan.c (ie the shrink_page_list -> 
> try_to_release_page -> try_to_release_buffers callchain), but what about 
> the other callchains (through filesystems, or through "pagevec_strip()" or 
> similar?) That pagevec_strip() is called from shrink_active_list(), I 
> don't see that unmapping the pages..

Right. But would it really matter whether they are currently mapped or
not, given that we agree it may become mapped at any point?

I think the problem Andrew identified is real.

The issue is the disconnect between the pte dirtiness and a filesystem
bringing buffers clean. But I disagree with his fix, because we don't
actually want to just throw out that pte dirtiness information: we're
just trying to get the PG_dirty bit into synch with what the buffers are
telling us, not actually clean or dirty anything, as such.

Can we clear the page dirty bit, then run set_page_dirty afterwards, if
any dirty ptes are found?

The other thing we might be able to do is to skip doing the
clear_page_dirty if the page is uptodate. This feels more hackish but it
might be faster?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/