[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090528104022.GZ11363@kernel.dk>
Date: Thu, 28 May 2009 12:40:23 +0200
From: Jens Axboe <jens.axboe@...cle.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
chris.mason@...cle.com, david@...morbit.com, hch@...radead.org,
akpm@...ux-foundation.org, yanmin_zhang@...ux.intel.com,
richard@....demon.co.uk, damien.wyart@...e.fr
Subject: Re: [PATCH 07/11] writeback: support > 1 flusher thread per bdi
On Thu, May 28 2009, Jan Kara wrote:
> On Wed 27-05-09 11:41:48, Jens Axboe wrote:
> > Build on the bdi_writeback support by allowing registration of
> > more than 1 flusher thread. File systems can call bdi_add_flusher_task(bdi)
> > to add more flusher threads to the device. If they do so, they must also
> > provide a super_operations function to return the suitable bdi_writeback
> > struct from any given inode.
> Isn't this patch in fact doing two different things? Looking at the
> changes it implements more flusher threads per BDI as you write above but
> it also does all that sync-work handling which isn't quite trivial and
> seems to be a separate thing...
Yes, but the separated work is a pre-requisite for multi thread support.
But I guess it would clean it up as a patchset if I added that with the
previous patch, that would save me from having to add the
writeback_acquire()/release changes. But I still have to do those, so
perhaps not... Unless you feel strongly about it, I'd prefer to leave it
as-is.
> > +static struct bdi_work *bdi_alloc_work(struct super_block *sb, long nr_pages,
> > + enum writeback_sync_modes sync_mode)
> > +{
> > + struct bdi_work *work;
> > +
> > + work = kmalloc(sizeof(*work), GFP_ATOMIC);
> Why is this GFP_ATOMIC? Wouldn't GFP_NOFS be enough?
> But for now, GFP_ATOMIC may be fine since it excercises more the
> "allocation failed" path :).
GFP_NOFS/NOIO should be ok. I prefer GFP_ATOMIC since we never enter any
reclaim, so better safe than sorry. We still have to handle failure
either way, the rate of failure may be different though.
> > + if (work)
> > + bdi_work_init(work, sb, nr_pages, sync_mode);
> > +
> > + return work;
> > +}
> > +
> > +void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
> > + long nr_pages, enum writeback_sync_modes sync_mode)
> > +{
> > + const bool must_wait = sync_mode == WB_SYNC_ALL;
> > + struct bdi_work work_stack, *work = NULL;
> > +
> > + if (!must_wait)
> > + work = bdi_alloc_work(sb, nr_pages, sync_mode);
> > +
> > + if (!work) {
> > + work = &work_stack;
> > + bdi_work_init_on_stack(work, sb, nr_pages, sync_mode);
> > }
> >
> > - wb_start_writeback(&bdi->wb, sb, nr_pages, sync_mode);
> > - return 0;
> > + bdi_queue_work(bdi, work);
> > + bdi_start_work(bdi, work);
> > +
> > + /*
> > + * If the sync mode is WB_SYNC_ALL, block waiting for the work to
> > + * complete. If not, we only need to wait for the work to be started,
> > + * if we allocated it on-stack. We use the same mechanism, if the
> > + * wait bit is set in the bdi_work struct, then threads will not
> > + * clear pending until after they are done.
> > + *
> > + * Note that work == &work_stack if must_wait is true, so we don't
> > + * need to do call_rcu() here ever, since the completion path will
> > + * have done that for us.
> > + */
> > + if (must_wait || work == &work_stack) {
> > + bdi_wait_on_work_clear(work);
> > + if (work != &work_stack)
> > + call_rcu(&work->rcu_head, bdi_work_free);
> > + }
> > }
> Looking into this, it seems a bit complex with all that on_stack, sync /
> nosync thing. Wouldn't a simple refcounting scheme be more clear? Alloc /
> init_work_on_stack gets a reference from bdi_work, queue_work gets another
> reference passed later to flusher thread. We drop the reference when we
> leave bdi_start_writeback() and when flusher thread is done with the work.
> When refcount hits zero, work struct is freed (when work is on stack, we
> just never drop the last reference)...
It wouldn't change the complexity of the stack vs non-stack at all,
since you have to do the same checks for when it's safe to proceed. And
having the single bit there with the hash bit wait queues makes that bit
easier.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists