[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACVXFVMWkdRxWrh14DyGu9uV354z1pdB4TRzzTtZBtu+fD+JVg@mail.gmail.com>
Date: Sun, 19 Jul 2015 20:12:59 +0800
From: Ming Lei <tom.leiming@...il.com>
To: Akinobu Mita <akinobu.mita@...il.com>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH v3 6/7] blk-mq: fix freeze queue race
On Sun, Jul 19, 2015 at 12:28 AM, Akinobu Mita <akinobu.mita@...il.com> wrote:
> There are several race conditions while freezing queue.
>
> When unfreezing queue, there is a small window between decrementing
> q->mq_freeze_depth to zero and percpu_ref_reinit() call with
> q->mq_usage_counter. If the other calls blk_mq_freeze_queue_start()
> in the window, q->mq_freeze_depth is increased from zero to one and
> percpu_ref_kill() is called with q->mq_usage_counter which is already
> killed. percpu refcount should be re-initialized before killed again.
>
> Also, there is a race condition while switching to percpu mode.
> percpu_ref_switch_to_percpu() and percpu_ref_kill() must not be
> executed at the same time as the following scenario is possible:
>
> 1. q->mq_usage_counter is initialized in atomic mode.
> (atomic counter: 1)
>
> 2. After the disk registration, a process like systemd-udev starts
> accessing the disk, and successfully increases refcount successfully
> by percpu_ref_tryget_live() in blk_mq_queue_enter().
> (atomic counter: 2)
>
> 3. In the final stage of initialization, q->mq_usage_counter is being
> switched to percpu mode by percpu_ref_switch_to_percpu() in
> blk_mq_finish_init(). But if CONFIG_PREEMPT_VOLUNTARY is enabled,
> the process is rescheduled in the middle of switching when calling
> wait_event() in __percpu_ref_switch_to_percpu().
> (atomic counter: 2)
>
> 4. CPU hotplug handling for blk-mq calls percpu_ref_kill() to freeze
> request queue. q->mq_usage_counter is decreased and marked as
> DEAD. Wait until all requests have finished.
> (atomic counter: 1)
>
> 5. The process rescheduled in the step 3. is resumed and finishes
> all remaining work in __percpu_ref_switch_to_percpu().
> A bias value is added to atomic counter of q->mq_usage_counter.
> (atomic counter: PERCPU_COUNT_BIAS + 1)
>
> 6. A request issed in the step 2. is finished and q->mq_usage_counter
> is decreased by blk_mq_queue_exit(). q->mq_usage_counter is DEAD,
> so atomic counter is decreased and no release handler is called.
> (atomic counter: PERCPU_COUNT_BIAS)
>
> 7. CPU hotplug handling in the step 4. will wait forever as
> q->mq_usage_counter will never be zero.
>
> Also, percpu_ref_reinit() and percpu_ref_kill() must not be executed
> at the same time. Because both functions could call
> __percpu_ref_switch_to_percpu() which adds the bias value and
> initialize percpu counter.
>
> Fix those races by serializing with per-queue mutex.
>
> Signed-off-by: Akinobu Mita <akinobu.mita@...il.com>
> Acked-by: Tejun Heo <tj@...nel.org>
> Cc: Jens Axboe <axboe@...nel.dk>
> Cc: Ming Lei <tom.leiming@...il.com>
> Cc: Tejun Heo <tj@...nel.org>
Reviewed-by: Ming Lei <tom.leiming@...il.com>
> ---
> block/blk-core.c | 1 +
> block/blk-mq-sysfs.c | 2 ++
> block/blk-mq.c | 8 ++++++++
> include/linux/blkdev.h | 6 ++++++
> 4 files changed, 17 insertions(+)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 627ed0c..544b237 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -687,6 +687,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
> __set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags);
>
> init_waitqueue_head(&q->mq_freeze_wq);
> + mutex_init(&q->mq_freeze_lock);
>
> if (blkcg_init_queue(q))
> goto fail_bdi;
> diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
> index 79096a6..f63b464 100644
> --- a/block/blk-mq-sysfs.c
> +++ b/block/blk-mq-sysfs.c
> @@ -409,7 +409,9 @@ static void blk_mq_sysfs_init(struct request_queue *q)
> /* see blk_register_queue() */
> void blk_mq_finish_init(struct request_queue *q)
> {
> + mutex_lock(&q->mq_freeze_lock);
> percpu_ref_switch_to_percpu(&q->mq_usage_counter);
> + mutex_unlock(&q->mq_freeze_lock);
> }
>
> int blk_mq_register_disk(struct gendisk *disk)
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d861c70..b931e38 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -115,11 +115,15 @@ void blk_mq_freeze_queue_start(struct request_queue *q)
> {
> int freeze_depth;
>
> + mutex_lock(&q->mq_freeze_lock);
> +
> freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
> if (freeze_depth == 1) {
> percpu_ref_kill(&q->mq_usage_counter);
> blk_mq_run_hw_queues(q, false);
> }
> +
> + mutex_unlock(&q->mq_freeze_lock);
> }
> EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
>
> @@ -143,12 +147,16 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
> {
> int freeze_depth;
>
> + mutex_lock(&q->mq_freeze_lock);
> +
> freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
> WARN_ON_ONCE(freeze_depth < 0);
> if (!freeze_depth) {
> percpu_ref_reinit(&q->mq_usage_counter);
> wake_up_all(&q->mq_freeze_wq);
> }
> +
> + mutex_unlock(&q->mq_freeze_lock);
> }
> EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index b02c90b..b867c32 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -457,6 +457,12 @@ struct request_queue {
> #endif
> struct rcu_head rcu_head;
> wait_queue_head_t mq_freeze_wq;
> + /*
> + * Protect concurrent access to mq_usage_counter by
> + * percpu_ref_switch_to_percpu(), percpu_ref_kill(), and
> + * percpu_ref_reinit().
> + */
> + struct mutex mq_freeze_lock;
> struct percpu_ref mq_usage_counter;
> struct list_head all_q_node;
>
> --
> 1.9.1
>
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists