[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0612181648490.3479@woody.osdl.org>
Date: Mon, 18 Dec 2006 16:57:30 -0800 (PST)
From: Linus Torvalds <torvalds@...l.org>
To: Andrei Popa <andrei.popa@...eo.ro>
cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Andrew Morton <akpm@...l.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Hugh Dickins <hugh@...itas.com>,
Florian Weimer <fw@...eb.enyo.de>,
Marc Haber <mh+linux-kernel@...schlus.de>,
Martin Michlmayr <tbm@...ius.com>
Subject: Re: 2.6.19 file content corruption on ext3
On Tue, 19 Dec 2006, Andrei Popa wrote:
> > >
> > > nope, no file corruption at all.
> >
> > Ok. That's interesting, but I think you actually #ifdef'ed out too
> > much:
> >
> > It was really just the _inner_ "if (mapping_cap_account_dirty(.."
> > statement that I meant you should remove.
> >
> > Can you try that too?
>
> I have file corruption: "Hash check on download completion found bad
> chunks, consider using "safe_sync"."
Ok, that's interesting.
So it doesn't seem to be the call to page_mkclean() itself that causes
corruption. It looks like Peter's hunch that maybe there's some bug in
PG_dirty handling _itself_ might be an idea..
And the reason it only started happening now is that it may just have been
_hidden_ by the fact that while we kept the dirty bits in the page tables,
we'd end up writing the dirty page _despite_ having lost the PG_dirty bit.
So if it's some bad interaction between writable mappings and some other
part of the system, we just didn't see it earlier, exactly because we had
_lots_ of dirty bits, and it was enough that _one_ of them was right.
If you didn't see corruption when you #ifdef'ed out too much of the
"test_clean_page_dirty() function (the _whole_ TestClearPageDirty()
if-statement), but you get it when you just comment out the stuff that
does the page_mkclean(), that's interesting.
I'm left lookin gat the "radix_tree_tag_clear()" in
test_clear_page_dirty().
What happens if you only ifdef out that single thing?
The actual page-cleaning functions make sure to only clear the TAG_DIRTY
bit _after_ the page has been marked for writeback. Is there some ordering
constraint there, perhaps?
I'm really reaching here. I'm trying to see the pattern, and I'm not
seeing it. I'm asking you to test things just to get more of a feel for
what triggers the failure, than because I actually have any kind of idea
of what the heck is going on.
Andrew, Nick, Hugh - any ideas?
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists