[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110411145022.710c30e9@notabene.brown>
Date: Mon, 11 Apr 2011 14:50:22 +1000
From: NeilBrown <neilb@...e.de>
To: Mike Snitzer <snitzer@...hat.com>, Jens Axboe <jaxboe@...ionio.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"hch@...radead.org" <hch@...radead.org>, dm-devel@...hat.com,
linux-raid@...r.kernel.org
Subject: Re: [PATCH 05/10] block: remove per-queue plugging
On Tue, 5 Apr 2011 13:05:41 +1000 NeilBrown <neilb@...e.de> wrote:
> On Wed, 9 Mar 2011 19:58:10 -0500 Mike Snitzer <snitzer@...hat.com> wrote:
>
> > Also, in your MD changes, you removed all calls to md_unplug() but
> > didn't remove md_unplug(). Seems it should be removed along with the
> > 'plug' member of 'struct mddev_t'? Neil?
>
> I've been distracted by other things and only just managed to have a look at
> this.
>
> The new plugging code seems to completely ignore the needs of stacked devices
> - or at least my needs in md.
>
> For RAID1 with a write-intent-bitmap, I queue all write requests and then on
> an unplug I update the write-intent-bitmap to mark all the relevant blocks
> and then release the writes.
>
> With the new code there is no way for an unplug event to wake up the raid1d
> thread to start the writeout - I haven't tested it but I suspect it will just
> hang.
>
> Similarly for RAID5 I gather write bios (long before they become 'struct
> request' which is what the plugging code understands) and on an unplug event
> I release the writes - hopefully with enough bios per stripe so that we don't
> need to pre-read.
>
> Possibly the simplest fix would be to have a second list_head in 'struct
> blk_plug' which contained callbacks (a function pointer a list_head in a
> struct which is passed as an arg to the function!).
> blk_finish_plug could then walk the list and call the call-backs.
> It would be quite easy to hook into that.
I've implemented this and it seems to work.
Jens: could you please review and hopefully ack the patch below, and let
me know if you will submit it or should I?
My testing of this combined with some other patches which cause various md
personalities to use it shows up a bug somewhere.
The symptoms are crashes in various places in blk-core and sometimes
elevator.c
list_sort occurs fairly often included in the stack but not always.
This patch
diff --git a/block/blk-core.c b/block/blk-core.c
index 273d60b..903ce8d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2674,19 +2674,23 @@ static void flush_plug_list(struct blk_plug *plug)
struct request_queue *q;
unsigned long flags;
struct request *rq;
+ struct list_head head;
BUG_ON(plug->magic != PLUG_MAGIC);
if (list_empty(&plug->list))
return;
+ list_add(&head, &plug->list);
+ list_del_init(&plug->list);
if (plug->should_sort)
- list_sort(NULL, &plug->list, plug_rq_cmp);
+ list_sort(NULL, &head, plug_rq_cmp);
+ plug->should_sort = 0;
q = NULL;
local_irq_save(flags);
- while (!list_empty(&plug->list)) {
- rq = list_entry_rq(plug->list.next);
+ while (!list_empty(&head)) {
+ rq = list_entry_rq(head.next);
list_del_init(&rq->queuelist);
BUG_ON(!(rq->cmd_flags & REQ_ON_PLUG));
BUG_ON(!rq->q);
makes the symptom go away. It simply moves the plug list onto a separate
list head before sorting and processing it.
My test was simply writing to a RAID1 with dd:
while true; do dd if=/dev/zero of=/dev/md0 size=4k; done
Obviously all writes go to two devices so the plug list will always need
sorting.
The only explanation I can come up with is that very occasionally schedule on
2 separate cpus calls blk_flush_plug for the same task. I don't understand
the scheduler nearly well enough to know if or how that can happen.
However with this patch in place I can write to a RAID1 constantly for half
an hour, and without it, the write rarely lasts for 3 minutes.
If you want to reproduce my experiment, you can pull from
git://neil.brown.name/md plug-test
to get my patches for plugging in md (which are quite ready for submission
but seem to work), create a RAID1 using e.g.
mdadm -C /dev/md0 --level=1 --raid-disks=2 /dev/device1 /dev/device2
while true; do dd if=/dev/zero of=/dev/md0 bs=4K ; done
Thanks,
NeilBrown
>From 687b189c02276887dd7d5b87a817da9f67ed3c2c Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@...e.de>
Date: Thu, 7 Apr 2011 13:16:59 +1000
Subject: [PATCH] Enhance new plugging support to support general callbacks.
md/raid requires an unplug callback, but as it does not uses
requests the current code cannot provide one.
So allow arbitrary callbacks to be attached to the blk_plug.
Cc: Jens Axboe <jaxboe@...ionio.com>
Signed-off-by: NeilBrown <neilb@...e.de>
---
block/blk-core.c | 13 +++++++++++++
include/linux/blkdev.h | 7 ++++++-
2 files changed, 19 insertions(+), 1 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 725091d..273d60b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2644,6 +2644,7 @@ void blk_start_plug(struct blk_plug *plug)
plug->magic = PLUG_MAGIC;
INIT_LIST_HEAD(&plug->list);
+ INIT_LIST_HEAD(&plug->cb_list);
plug->should_sort = 0;
/*
@@ -2717,9 +2718,21 @@ static void flush_plug_list(struct blk_plug *plug)
local_irq_restore(flags);
}
+static void flush_plug_callbacks(struct blk_plug *plug)
+{
+ while (!list_empty(&plug->cb_list)) {
+ struct blk_plug_cb *cb = list_first_entry(&plug->cb_list,
+ struct blk_plug_cb,
+ list);
+ list_del(&cb->list);
+ cb->callback(cb);
+ }
+}
+
static void __blk_finish_plug(struct task_struct *tsk, struct blk_plug *plug)
{
flush_plug_list(plug);
+ flush_plug_callbacks(plug);
if (plug == tsk->plug)
tsk->plug = NULL;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 32176cc..3e5e604 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -857,8 +857,13 @@ extern void blk_put_queue(struct request_queue *);
struct blk_plug {
unsigned long magic;
struct list_head list;
+ struct list_head cb_list;
unsigned int should_sort;
};
+struct blk_plug_cb {
+ struct list_head list;
+ void (*callback)(struct blk_plug_cb *);
+};
extern void blk_start_plug(struct blk_plug *);
extern void blk_finish_plug(struct blk_plug *);
@@ -876,7 +881,7 @@ static inline bool blk_needs_flush_plug(struct task_struct *tsk)
{
struct blk_plug *plug = tsk->plug;
- return plug && !list_empty(&plug->list);
+ return plug && (!list_empty(&plug->list) || !list_empty(&plug->cb_list));
}
/*
--
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists