[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090916130637.GN23126@kernel.dk>
Date: Wed, 16 Sep 2009 15:06:37 +0200
From: Jens Axboe <jens.axboe@...cle.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
chris.mason@...cle.com, hch@...radead.org, tytso@....edu,
akpm@...ux-foundation.org, trond.myklebust@....uio.no
Subject: Re: [PATCH 04/11] writeback: make wb_writeback() take an argument
structure
On Wed, Sep 16 2009, Jan Kara wrote:
> On Tue 15-09-09 20:16:50, Jens Axboe wrote:
> > We need to be able to pass in range_cyclic as well, so instead
> > of growing yet another argument, split the arguments into a
> > struct wb_writeback_args structure that we can use internally.
> > Also makes it easier to just copy all members to an on-stack
> > struct, since we can't access work after clearing the pending
> > bit.
> >
> > Signed-off-by: Jens Axboe <jens.axboe@...cle.com>
> > ---
> > fs/fs-writeback.c | 97 +++++++++++++++++++++++++++++++----------------------
> > 1 files changed, 57 insertions(+), 40 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 783ed44..1dd11d1 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -35,6 +35,17 @@
> > int nr_pdflush_threads;
> >
> > /*
> > + * Passed into wb_writeback(), essentially a subset of writeback_control
> > + */
> > +struct wb_writeback_args {
> > + long nr_pages;
> > + struct super_block *sb;
> > + enum writeback_sync_modes sync_mode;
> > + int for_kupdate;
> > + int range_cyclic;
> > +};
> You can save a few bytes by using bit-fields for for_kupdate and
> range_cyclic fields the same way as is done in wbc.
I was unsure whether that would be reliable for copying. Seems it
should.
> > @@ -69,9 +78,10 @@ static inline void bdi_work_init(struct bdi_work *work,
> > struct writeback_control *wbc)
> > {
> > INIT_RCU_HEAD(&work->rcu_head);
> > - work->sb = wbc->sb;
> > - work->nr_pages = wbc->nr_to_write;
> > - work->sync_mode = wbc->sync_mode;
> > + work->args.sb = wbc->sb;
> > + work->args.nr_pages = wbc->nr_to_write;
> > + work->args.sync_mode = wbc->sync_mode;
> > + work->args.range_cyclic = wbc->range_cyclic;
> > work->state = WS_USED;
> > }
> We don't pass for_kupdate here. But probably that's fine since noone else
> should set it. But maybe we should at least initialize it?
Good idea, will set it.
> > @@ -673,29 +682,35 @@ static long wb_writeback(struct bdi_writeback *wb, long nr_pages,
> > oldest_jif = jiffies -
> > msecs_to_jiffies(dirty_expire_interval * 10);
> > }
> > + if (!wbc.range_cyclic) {
> > + wbc.range_start = 0;
> > + wbc.range_end = LLONG_MAX;
> > + }
> >
> > for (;;) {
> > - /*
> > - * Don't flush anything for non-integrity writeback where
> > - * no nr_pages was given
> > - */
> > - if (!for_kupdate && nr_pages <= 0 && sync_mode == WB_SYNC_NONE)
> > - break;
> > + if (!args->for_kupdate && args->nr_pages <= 0) {
> > + /*
> > + * Don't flush anything for non-integrity writeback
> > + * where no nr_pages was given
> > + */
> > + if (args->sync_mode == WB_SYNC_NONE)
> > + break;
> >
> > - /*
> > - * If no specific pages were given and this is just a
> > - * periodic background writeout and we are below the
> > - * background dirty threshold, don't do anything
> > - */
> > - if (for_kupdate && nr_pages <= 0 && !over_bground_thresh())
> > - break;
> > + /*
> > + * If no specific pages were given and this is just a
> > + * periodic background writeout and we are below the
> > + * background dirty threshold, don't do anything
> > + */
> > + if (!over_bground_thresh())
> > + break;
> > + }
> This chunk actually changes the behavior since previously the second
> condition had for_kupdate and not !for_kupdate. But revisiting this code,
> the pdflush-style writeback still doesn't work how it used to. Merging
> kupdate-style and pdflush-style writeback isn't that easy.
> For the first one we want to flush only the expired inodes, but we should
> guarantee we really flush them.
> For the second one we want to flush as much data as possible and don't
> really care about which inodes we flush. We only have to work until we hit
> the background threshold.
> So probably check_old_data_flush() has to issue a kupdate-style writeback
> like it does now and then it should loop submitting normal WB_SYNC_NONE
> writeback until we get below background threshold. In theory, we could use
> the look in wb_writeback() like we try now but that would mean we have to
> pass another flag, whether this is a pdflush style writeback or not.
Good spotting, I'll fix that up and have the old flush continue until
!over_bground_thresh(). It would be nice to get rid of the weird special
casing in wb_writeback() and just retain it as a worker, pushing the
logic into the internal callers.
> > wbc.more_io = 0;
> > wbc.encountered_congestion = 0;
> > wbc.nr_to_write = MAX_WRITEBACK_PAGES;
> > wbc.pages_skipped = 0;
> > writeback_inodes_wb(wb, &wbc);
> > - nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
> > + args->nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
> > wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;
> >
> > /*
> > @@ -749,8 +764,14 @@ static long wb_check_old_data_flush(struct bdi_writeback *wb)
> > global_page_state(NR_UNSTABLE_NFS) +
> > (inodes_stat.nr_inodes - inodes_stat.nr_unused);
> >
> > - if (nr_pages)
> > - return wb_writeback(wb, nr_pages, NULL, WB_SYNC_NONE, 1);
> > + if (nr_pages) {
> > + struct wb_writeback_args args = {
> > + .nr_pages = nr_pages,
> > + .sync_mode = WB_SYNC_NONE,
> > + };
> We should set for_kupdate here.
Indeed, thanks.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists