lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <lsq.1496331795.135805863@decadent.org.uk>
Date:   Thu, 01 Jun 2017 16:43:15 +0100
From:   Ben Hutchings <ben@...adent.org.uk>
To:     linux-kernel@...r.kernel.org, stable@...r.kernel.org
CC:     akpm@...ux-foundation.org, "Jens Axboe" <axboe@...com>,
        "Omar Sandoval" <osandov@...com>,
        "Martin Raiber" <martin@...ackup.org>
Subject: [PATCH 3.16 057/212] sbitmap: fix wakeup hang after sbq resize

3.16.44-rc1 review patch.  If anyone has any objections, please let me know.

------------------

From: Omar Sandoval <osandov@...com>

commit 6c0ca7ae292adea09b8bdd33a524bb9326c3e989 upstream.

When we resize a struct sbitmap_queue, we update the wakeup batch size,
but we don't update the wait count in the struct sbq_wait_states. If we
resized down from a size which could use a bigger batch size, these
counts could be too large and cause us to miss necessary wakeups. To fix
this, update the wait counts when we resize (ensuring some careful
memory ordering so that it's safe w.r.t. concurrent clears).

This also fixes a theoretical issue where two threads could end up
bumping the wait count up by the batch size, which could also
potentially lead to hangs.

Reported-by: Martin Raiber <martin@...ackup.org>
Fixes: e3a2b3f931f5 ("blk-mq: allow changing of queue depth through sysfs")
Fixes: 2971c35f3588 ("blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt")
Signed-off-by: Omar Sandoval <osandov@...com>
Signed-off-by: Jens Axboe <axboe@...com>
[bwh: Backported to 3.16:
 - Adjust filename
 - Rename almost everything
 - Use ACCESS_ONCE() instead of {READ,WRITE}_ONCE()]
Signed-off-by: Ben Hutchings <ben@...adent.org.uk>
---
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -340,6 +340,7 @@ static void bt_clear_tag(struct blk_mq_b
 {
 	const int index = TAG_TO_INDEX(bt, tag);
 	struct bt_wait_state *bs;
+	unsigned int wake_batch;
 	int wait_cnt;
 
 	clear_bit(TAG_TO_BIT(bt, tag), &bt->map[index].word);
@@ -352,10 +353,22 @@ static void bt_clear_tag(struct blk_mq_b
 		return;
 
 	wait_cnt = atomic_dec_return(&bs->wait_cnt);
-	if (unlikely(wait_cnt < 0))
-		wait_cnt = atomic_inc_return(&bs->wait_cnt);
-	if (wait_cnt == 0) {
-		atomic_add(bt->wake_cnt, &bs->wait_cnt);
+	if (wait_cnt <= 0) {
+		wake_batch = ACCESS_ONCE(bt->wake_cnt);
+		/*
+		 * Pairs with the memory barrier in bt_update_count() to
+		 * ensure that we see the batch size update before the wait
+		 * count is reset.
+		 */
+		smp_mb__before_atomic();
+		/*
+		 * If there are concurrent callers to bt_clear_tag(), the last
+		 * one to decrement the wait count below zero will bump it back
+		 * up. If there is a concurrent resize, the count reset will
+		 * either cause the cmpxchg to fail or overwrite after the
+		 * cmpxchg.
+		 */
+		atomic_cmpxchg(&bs->wait_cnt, wait_cnt, wait_cnt + wake_batch);
 		bt_index_atomic_inc(&bt->wake_index);
 		wake_up(&bs->wait);
 	}
@@ -450,20 +463,30 @@ static void bt_update_count(struct blk_m
 {
 	unsigned int tags_per_word = 1U << bt->bits_per_word;
 	unsigned int map_depth = depth;
+	unsigned int wake_batch;
+	int i;
 
 	if (depth) {
-		int i;
-
 		for (i = 0; i < bt->map_nr; i++) {
 			bt->map[i].depth = min(map_depth, tags_per_word);
 			map_depth -= bt->map[i].depth;
 		}
 	}
 
-	bt->wake_cnt = BT_WAIT_BATCH;
-	if (bt->wake_cnt > depth / BT_WAIT_QUEUES)
-		bt->wake_cnt = max(1U, depth / BT_WAIT_QUEUES);
-
+	wake_batch = BT_WAIT_BATCH;
+	if (wake_batch > depth / BT_WAIT_QUEUES)
+		wake_batch = max(1U, depth / BT_WAIT_QUEUES);
+
+	if (bt->wake_cnt != wake_batch) {
+		ACCESS_ONCE(bt->wake_cnt) = wake_batch;
+		/*
+		 * Pairs with the memory barrier in bt_clear_tag() to ensure
+		 * that the batch size is updated before the wait counts.
+		 */
+		smp_mb__before_atomic();
+		for (i = 0; i < BT_WAIT_QUEUES; i++)
+			atomic_set(&bt->bs[i].wait_cnt, 1);
+	}
 	bt->depth = depth;
 }
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ