[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0612281156150.4473@woody.osdl.org>
Date: Thu, 28 Dec 2006 12:14:31 -0800 (PST)
From: Linus Torvalds <torvalds@...l.org>
To: Andrew Morton <akpm@...l.org>
cc: Guillaume Chazarain <guichaz@...oo.fr>,
David Miller <davem@...emloft.net>, ranma@...edrich.de,
gordonfarquharson@...il.com, tbm@...ius.com,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, andrei.popa@...eo.ro,
hugh@...itas.com, nickpiggin@...oo.com.au, arjan@...radead.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Chen Kenneth W <kenneth.w.chen@...el.com>
Subject: Re: [PATCH] mm: fix page_mkclean_one
On Thu, 28 Dec 2006, Andrew Morton wrote:
>
> It would be interesting to convert your app to do fsync() before
> FADV_DONTNEED. That would take WB_SYNC_NONE out of the picture as well
> (apart from pdflush activity).
I get corruption - but the whole point is that it's very much pdflush that
should be writing these pages out.
Andrew - give my test-program a try. It can run in about 1 minute if you
have a 256MB machine (I didn't, but "mem=256M" is my friend), and it seems
to very consistently cause corruption.
What I do is:
# Make sure we write back aggressively
echo 5 > /proc/sys/vm/dirty_ratio
as root, and then just run the thing. Tons of corruption. But the
corruption goes away if I just leave the default dirty ratio alone (but
then I can increse the file size to trigger it, of course - but that
also makes the test run a lot slower).
Now, with a pre-2.6.19 kernel, I bet you won't get the corruption as
easily (at least with the "fsync()"), but that's less to do with anything
new, and probably just because then you simply won't have any pdflushing
going on - since the kernel won't even notice that you have tons of dirty
pages ;)
It might also depend on the speed of your disk drive - the machine I test
this on has a slow 4200 rpm laptop drive in it, and that probably makes
things go south more easily. That's _especially_ true if this is related
to any "bdi_write_congested()" logic.
Now, it could also be related to various code snippets like
...
if (wbc->sync_mode != WB_SYNC_NONE)
wait_on_page_writeback(page);
if (PageWriteback(page) ||
!clear_page_dirty_for_io(page)) {
unlock_page(page);
continue;
}
...
where the WB_SYNC_NONE case will hit the "PageWriteback()" and just not do
the writeback at all (but it also won't clear the dirty bit, so it's
certainly not an *OBVIOUS* bug).
We also have code like this ("pageout()"):
if (clear_page_dirty_for_io(page)) {
int res;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_NONE,
..
}
...
res = mapping->a_ops->writepage(page, &wbc);
and in this case, if the "WB_SYNC_NONE" means that the "writepage()" call
won't do anything at all because of congestion, then that would be a _bad_
thing, and would certainly explain how something didn't get written out.
But that particular path should only trigger for the "shrink_page_list()"
case, and it's not the case I seem to be testing with my "low dirty_ratio"
testing.
Linus
View attachment "test.c" of type "TEXT/PLAIN" (2872 bytes)
Powered by blists - more mailing lists