[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20061218175449.3c752879.akpm@osdl.org>
Date: Mon, 18 Dec 2006 17:54:49 -0800
From: Andrew Morton <akpm@...l.org>
To: andrei.popa@...eo.ro
Cc: Linus Torvalds <torvalds@...l.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Hugh Dickins <hugh@...itas.com>,
Florian Weimer <fw@...eb.enyo.de>,
Marc Haber <mh+linux-kernel@...schlus.de>,
Martin Michlmayr <tbm@...ius.com>
Subject: Re: 2.6.19 file content corruption on ext3
On Tue, 19 Dec 2006 03:44:51 +0200
Andrei Popa <andrei.popa@...eo.ro> wrote:
> On Mon, 2006-12-18 at 17:21 -0800, Andrew Morton wrote:
> > On Mon, 18 Dec 2006 16:57:30 -0800 (PST)
> > Linus Torvalds <torvalds@...l.org> wrote:
> >
> > > What happens if you only ifdef out that single thing?
> > >
> > > The actual page-cleaning functions make sure to only clear the TAG_DIRTY
> > > bit _after_ the page has been marked for writeback. Is there some ordering
> > > constraint there, perhaps?
> > >
> > > I'm really reaching here. I'm trying to see the pattern, and I'm not
> > > seeing it. I'm asking you to test things just to get more of a feel for
> > > what triggers the failure, than because I actually have any kind of idea
> > > of what the heck is going on.
> > >
> > > Andrew, Nick, Hugh - any ideas?
> >
> > If all of test_clear_page_dirty() has been commented out then the page will
> > never become clean hence will never fall out of pagecache, so unless Andrei
> > is doing a reboot before checking for corruption, perhaps the underlying
> > data on-disk is incorrect, but we can't see it.
>
> if I do a sync and echo 1 > /proc/sys/vm/drop_caches
OK, that works.
> does the reboot is
> still necesary ?
It might be necessary to reboot in this case - if we're leaving the
pagecache dirty, writing to drop_caches won't remove it. And you probably
won't be able to get a clean reboot either.
> >
> > Andrei, how _are_ you running this test? What's the exact sequence of steps?
> >
> > In particular, are you doing anything which would cause the corrupted file
> > to be evicted from memory, thus forcing a read from disk? Such as
> > unmounting and then remounting the filesystem?
>
> I boot linux, I start rtorrent and start the download, while it's
> downloading I start evolution and i check my mail(my mbox is very large,
> several hundered megabytes), I close evolution(I use evolution just to
> have another application witch uses the filesystem and the memory), I
> start evolution again. I start firefox. The download is complete.
> Rtorrent says if the hash is good or not. I do a "unrar t qwe.rar" to
> test that all 84 downloaded rar files are ok and see the result.
>
> >
> > The point of my question is to check that the data is really incorrect
> > on-disk, or whether it is incorrect in pagecache.
> >
> > Also, it'd be useful if you could determine whether the bug appears with
> > the ext2 filesystem: do s/ext3/ext2/ in /etc/fstab, or boot with
> > rootfstype=ext2 if it's the root filesystem.
>
> I will test.
ok, thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists