lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1434894751-6877-5-git-send-email-akinobu.mita@gmail.com>
Date:	Sun, 21 Jun 2015 22:52:31 +0900
From:	Akinobu Mita <akinobu.mita@...il.com>
To:	linux-kernel@...r.kernel.org, Jens Axboe <axboe@...nel.dk>
Cc:	Akinobu Mita <akinobu.mita@...il.com>
Subject: [PATCH 4/4] blk-mq: fix mq_usage_counter race when switching to percpu mode

percpu_ref_switch_to_percpu() and percpu_ref_kill() must not be
executed at the same time as the following scenario is possible:

1. q->mq_usage_counter is initialized in atomic mode.
   (atomic counter: 1)

2. After the disk registration, a process like systemd-udev starts
   accessing the disk, and successfully increases refcount successfully
   by percpu_ref_tryget_live() in blk_mq_queue_enter().
   (atomic counter: 2)

3. In the final stage of initialization, q->mq_usage_counter is being
   switched to percpu mode by percpu_ref_switch_to_percpu() in
   blk_mq_finish_init().  But if CONFIG_PREEMPT_VOLUNTARY is enabled,
   the process is rescheduled in the middle of switching when calling
   wait_event() in __percpu_ref_switch_to_percpu().
   (atomic counter: 2)

4. CPU hotplug handling for blk-mq calls percpu_ref_kill() to freeze
   request queue.  q->mq_usage_counter is decreased and marked as
   DEAD.  Wait until all requests have finished.
   (atomic counter: 1)

5. The process rescheduled in the step 3. is resumed and finishes
   all remaining work in __percpu_ref_switch_to_percpu().
   A bias value is added to atomic counter of q->mq_usage_counter.
   (atomic counter: PERCPU_COUNT_BIAS + 1)

6. A request issed in the step 2. is finished and q->mq_usage_counter
   is decreased by blk_mq_queue_exit().  q->mq_usage_counter is DEAD,
   so atomic counter is decreased and no release handler is called.
   (atomic counter: PERCPU_COUNT_BIAS)

7. CPU hotplug handling in the step 4. will wait forever as
   q->mq_usage_counter will never be zero.

Also, percpu_ref_reinit() and percpu_ref_kill() must not be executed
at the same time.  Because both functions could call
__percpu_ref_switch_to_percpu() which adds the bias value and
initialize percpu counter.

Fix those races by serializing with a mutex.

Signed-off-by: Akinobu Mita <akinobu.mita@...il.com>
Cc: Jens Axboe <axboe@...nel.dk>
---
 block/blk-mq-sysfs.c   | 2 ++
 block/blk-mq.c         | 6 ++++++
 include/linux/blkdev.h | 6 ++++++
 3 files changed, 14 insertions(+)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index d8ef3a3..af3e126 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -407,7 +407,9 @@ static void blk_mq_sysfs_init(struct request_queue *q)
 /* see blk_register_queue() */
 void blk_mq_finish_init(struct request_queue *q)
 {
+	mutex_lock(&q->mq_usage_lock);
 	percpu_ref_switch_to_percpu(&q->mq_usage_counter);
+	mutex_unlock(&q->mq_usage_lock);
 }
 
 int blk_mq_register_disk(struct gendisk *disk)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 64d93e4..62d0ef1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -119,7 +119,9 @@ void blk_mq_freeze_queue_start(struct request_queue *q)
 	spin_unlock_irq(q->queue_lock);
 
 	if (freeze) {
+		mutex_lock(&q->mq_usage_lock);
 		percpu_ref_kill(&q->mq_usage_counter);
+		mutex_unlock(&q->mq_usage_lock);
 		blk_mq_run_hw_queues(q, false);
 	}
 }
@@ -150,7 +152,9 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 	WARN_ON_ONCE(q->mq_freeze_depth < 0);
 	spin_unlock_irq(q->queue_lock);
 	if (wake) {
+		mutex_lock(&q->mq_usage_lock);
 		percpu_ref_reinit(&q->mq_usage_counter);
+		mutex_unlock(&q->mq_usage_lock);
 		wake_up_all(&q->mq_freeze_wq);
 	}
 }
@@ -1961,6 +1965,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		hctxs[i]->queue_num = i;
 	}
 
+	mutex_init(&q->mq_usage_lock);
+
 	/*
 	 * Init percpu_ref in atomic mode so that it's faster to shutdown.
 	 * See blk_register_queue() for details.
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5d93a66..c5bf534 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -484,6 +484,12 @@ struct request_queue {
 	struct rcu_head		rcu_head;
 	wait_queue_head_t	mq_freeze_wq;
 	struct percpu_ref	mq_usage_counter;
+	/*
+	 * Protect concurrent access from percpu_ref_switch_to_percpu and
+	 * percpu_ref_kill, and access from percpu_ref_switch_to_percpu and
+	 * percpu_ref_reinit.
+	 */
+	struct mutex		mq_usage_lock;
 	struct list_head	all_q_node;
 
 	struct blk_mq_tag_set	*tag_set;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ