[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1002120722270.7792@localhost.localdomain>
Date: Fri, 12 Feb 2010 07:45:04 -0800 (PST)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Jens Axboe <jens.axboe@...cle.com>
cc: Linux Kernel <linux-kernel@...r.kernel.org>, jack@...e.cz,
jengelh@...ozas.de, stable@...nel.org, gregkh@...e.de
Subject: Re: [PATCH] writeback: Fix broken sync writeback
On Fri, 12 Feb 2010, Jens Axboe wrote:
>
> This fixes it by using the passed in page writeback count, instead of
> doing MAX_WRITEBACK_PAGES batches, which gets us much better performance
> (Jan reports it's up from ~400KB/sec to 10MB/sec) and makes sync(1)
> finish properly even when new pages are being dirted.
This seems broken.
The whole point of MAX_WRITEBACK_PAGES was to make sure that we don't
generate a single IO bigger than a couple of MB. And then we have that
loop around things to do multiple chunks. Your change to use nr_pages
seems to make the whole looping totally pointless, and breaks that "don't
do huge hunks" logic.
So I don't think that your patch is correct.
That said, I _do_ believe you when you say it makes a difference, which
makes me think there is a bug there. I just don't think you fixed the
right bug, and your change just happens to do what you wanted by pure
random luck.
The _real_ bug seems to bethe one you mentioned, but then ignored:
> Instead of flushing everything older than the sync run, it will do
> chunks of normal MAX_WRITEBACK_PAGES writeback and restart over and
> over.
and it would seem that the _logical_ way to fix it would be something like
the appended...
Hmm? Even when you do a 'fdatasync()', it's not going to guarantee that
any _future_ data is written back, so the 'oldest_jif' thing would seem to
be sane regardless of sync mode.
NOTE NOTE NOTE! I do have to admit that this patch scares me, because
there could be some bug in the 'older_than_this' logic that means that
somebody sets it even if the inode is already dirty. So this patch makes
conceptual sense to me, and I think it's the right thing to do, but I
also suspect that we do not actually have a lot of test coverage of the
whole 'older_than_this' logic, because it historically has been just a
small optimization for kupdated.
So this patch scares me, as it could break 'fdatasync' entirely. So
somebody should really double-check the whole 'dirtied_when' logic, just
to be safe. If anybody ever sets 'dirtied_when' to the current time even
if the inode is already dirty (and has an earlier dirtied_when'), then
that would open up 'fdatasync()' and friends up to not writing things
back properly at all (because a newer write set 'dirtied_when' so that
old writes get ignored and thought to be 'totally new')
Comments?
Linus
---
fs/fs-writeback.c | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1a7c42c..a0a8424 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -738,11 +738,16 @@ static long wb_writeback(struct bdi_writeback *wb,
long wrote = 0;
struct inode *inode;
- if (wbc.for_kupdate) {
- wbc.older_than_this = &oldest_jif;
- oldest_jif = jiffies -
- msecs_to_jiffies(dirty_expire_interval * 10);
- }
+ /*
+ * We never write back data that was dirtied _after_ we
+ * started writeback. But kupdate doesn't even want to
+ * write back recently dirtied stuff, only older data.
+ */
+ oldest_jif = jiffies-1;
+ wbc.older_than_this = &oldest_jif;
+ if (wbc.for_kupdate)
+ oldest_jif -= msecs_to_jiffies(dirty_expire_interval * 10);
+
if (!wbc.range_cyclic) {
wbc.range_start = 0;
wbc.range_end = LLONG_MAX;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists