[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <6ff04ade7e2ae10b19e3f5ce8f9966e83e0bdcec.1402582256.git.agordeev@redhat.com>
Date: Thu, 12 Jun 2014 17:05:37 +0200
From: Alexander Gordeev <agordeev@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: Alexander Gordeev <agordeev@...hat.com>,
Ming Lei <tom.leiming@...il.com>, Jens Axboe <axboe@...nel.dk>
Subject: [PATCH 2/3] blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt
This piece of code in bt_clear_tag() function is racy:
bs = bt_wake_ptr(bt);
if (bs && atomic_dec_and_test(&bs->wait_cnt)) {
atomic_set(&bs->wait_cnt, bt->wake_cnt);
wake_up(&bs->wait);
}
Since nothing prevents bt_wake_ptr() from returning the very
same 'bs' address on multiple CPUs, the following scenario is
possible:
CPU1 CPU2
---- ----
0. bs = bt_wake_ptr(bt); bs = bt_wake_ptr(bt);
1. atomic_dec_and_test(&bs->wait_cnt)
2. atomic_dec_and_test(&bs->wait_cnt)
3. atomic_set(&bs->wait_cnt, bt->wake_cnt);
If the decrement in [1] yields zero then for some amount of time
the decrement in [2] results in a negative/overflow value, which
is not expected. The follow-up assignment in [3] overwrites the
invalid value with the batch value (and likely prevents the issue
from being severe) which is still incorrect and should be a lesser.
Cc: Ming Lei <tom.leiming@...il.com>
Cc: Jens Axboe <axboe@...nel.dk>
Signed-off-by: Alexander Gordeev <agordeev@...hat.com>
---
block/blk-mq-tag.c | 14 ++++++++++++--
1 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index efe9419..5579fae 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -320,6 +320,7 @@ static void bt_clear_tag(struct blk_mq_bitmap_tags *bt, unsigned int tag)
{
const int index = TAG_TO_INDEX(bt, tag);
struct bt_wait_state *bs;
+ int wait_cnt;
/*
* The unlock memory barrier need to order access to req in free
@@ -328,10 +329,19 @@ static void bt_clear_tag(struct blk_mq_bitmap_tags *bt, unsigned int tag)
clear_bit_unlock(TAG_TO_BIT(bt, tag), &bt->map[index].word);
bs = bt_wake_ptr(bt);
- if (bs && atomic_dec_and_test(&bs->wait_cnt)) {
- atomic_set(&bs->wait_cnt, bt->wake_cnt);
+ if (!bs)
+ return;
+
+ wait_cnt = atomic_dec_return(&bs->wait_cnt);
+ if (wait_cnt == 0) {
+wake:
+ atomic_add(bt->wake_cnt, &bs->wait_cnt);
bt_index_atomic_inc(&bt->wake_index);
wake_up(&bs->wait);
+ } else if (wait_cnt < 0) {
+ wait_cnt = atomic_inc_return(&bs->wait_cnt);
+ if (!wait_cnt)
+ goto wake;
}
}
--
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists