[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110419111601.GA18961@localhost>
Date: Tue, 19 Apr 2011 19:16:01 +0800
From: Wu Fengguang <fengguang.wu@...el.com>
To: Jan Kara <jack@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mel@...ux.vnet.ibm.com>,
Dave Chinner <david@...morbit.com>,
Trond Myklebust <Trond.Myklebust@...app.com>,
Itaru Kitayama <kitayama@...bb4u.ne.jp>,
Minchan Kim <minchan.kim@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: [PATCH 5/6] writeback: try more writeback as long as something
was written
On Tue, Apr 19, 2011 at 06:20:16PM +0800, Jan Kara wrote:
> On Tue 19-04-11 11:00:08, Wu Fengguang wrote:
> > writeback_inodes_wb()/__writeback_inodes_sb() are not aggressive in that
> > they only populate possibly a subset of elegible inodes into b_io at
> > entrance time. When the queued set of inodes are all synced, they just
> > return, possibly with all queued inode pages written but still
> > wbc.nr_to_write > 0.
> >
> > For kupdate and background writeback, there may be more eligible inodes
> > sitting in b_dirty when the current set of b_io inodes are completed. So
> > it is necessary to try another round of writeback as long as we made some
> > progress in this round. When there are no more eligible inodes, no more
> > inodes will be enqueued in queue_io(), hence nothing could/will be
> > synced and we may safely bail.
> Let me understand your concern here: You are afraid that if we do
> for_background or for_kupdate writeback and we write less than
> MAX_WRITEBACK_PAGES, we stop doing writeback although there could be more
> inodes to write at the time we are stopping writeback - the two realistic
Yes.
> cases I can think of are:
> a) when inodes just freshly expired during writeback
> b) when bdi has less than MAX_WRITEBACK_PAGES of dirty data but we are over
> background threshold due to data on some other bdi. And then while we are
> doing writeback someone does dirtying at our bdi.
> Or do you see some other case as well?
>
> The a) case does not seem like a big issue to me after your changes to
Yeah (a) is not an issue with kupdate writeback.
> move_expired_inodes(). The b) case maybe but do you think it will make any
> difference?
(b) seems also weird. What in my mind is this for_background case.
Imagine 100 inodes
i0, i1, i2, ..., i90, i91, i99
At queue_io() time, i90-i99 happen to be expired and moved to s_io for
IO. When finished successfully, if their total size is less than
MAX_WRITEBACK_PAGES, nr_to_write will be > 0. Then wb_writeback() will
quit the background work (w/o this patch) while it's still over
background threshold.
This will be a fairly normal/frequent case I guess.
Thanks,
Fengguang
> Honza
> >
> > Jan raised the concern
> >
> > I'm just afraid that in some pathological cases this could
> > result in bad writeback pattern - like if there is some process
> > which manages to dirty just a few pages while we are doing
> > writeout, this looping could result in writing just a few pages
> > in each round which is bad for fragmentation etc.
> >
> > However it requires really strong timing to make that to (continuously)
> > happen. In practice it's very hard to produce such a pattern even if
> > it's possible in theory. I actually tried to write 1 page per 1ms with
> > this command
> >
> > write-and-fsync -n10000 -S 1000 -c 4096 /fs/test
> >
> > and do sync(1) at the same time. The sync completes quickly on ext4,
> > xfs, btrfs. The readers could try other write-and-sleep patterns and
> > check if it can block sync for longer time.
> >
> > CC: Jan Kara <jack@...e.cz>
> > Signed-off-by: Wu Fengguang <fengguang.wu@...el.com>
> > ---
> > fs/fs-writeback.c | 16 ++++++++--------
> > 1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > --- linux-next.orig/fs/fs-writeback.c 2011-04-19 10:18:30.000000000 +0800
> > +++ linux-next/fs/fs-writeback.c 2011-04-19 10:18:31.000000000 +0800
> > @@ -750,23 +750,23 @@ static long wb_writeback(struct bdi_writ
> > wrote += write_chunk - wbc.nr_to_write;
> >
> > /*
> > - * If we consumed everything, see if we have more
> > + * Did we write something? Try for more
> > + *
> > + * Dirty inodes are moved to b_io for writeback in batches.
> > + * The completion of the current batch does not necessarily
> > + * mean the overall work is done. So we keep looping as long
> > + * as made some progress on cleaning pages or inodes.
> > */
> > - if (wbc.nr_to_write <= 0)
> > + if (wbc.nr_to_write < write_chunk)
> > continue;
> > if (wbc.inodes_cleaned)
> > continue;
> > /*
> > - * Didn't write everything and we don't have more IO, bail
> > + * No more inodes for IO, bail
> > */
> > if (!wbc.more_io)
> > break;
> > /*
> > - * Did we write something? Try for more
> > - */
> > - if (wbc.nr_to_write < write_chunk)
> > - continue;
> > - /*
> > * Nothing written. Wait for some inode to
> > * become available for writeback. Otherwise
> > * we'll just busyloop.
> >
> >
> --
> Jan Kara <jack@...e.cz>
> SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists