lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 29 Oct 2015 22:55:19 +0800 From: Ming Lei <tom.leiming@...il.com> To: Jeff Moyer <jmoyer@...hat.com> Cc: Jens Axboe <axboe@...nel.dk>, Jason Luo <zhangqing.luo@...cle.com>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Guru Anbalagane <guru.anbalagane@...cle.com>, Feng Jin <joe.jin@...cle.com>, Tejun Heo <tj@...nel.org> Subject: Re: [patch] blk-mq: avoid excessive boot delays with large lun counts On Thu, Oct 29, 2015 at 10:20 PM, Jeff Moyer <jmoyer@...hat.com> wrote: > Hi Ming, > > Thanks for taking a look. > > Ming Lei <tom.leiming@...il.com> writes: > >> On Fri, Oct 23, 2015 at 10:57 PM, Jeff Moyer <jmoyer@...hat.com> wrote: >>> Hi, >>> >>> Zhangqing Luo reported long boot times on a system with thousands of >>> LUNs when scsi-mq was enabled. He narrowed the problem down to >>> blk_mq_add_queue_tag_set, where every queue is frozen in order to set >>> the BLK_MQ_F_TAG_SHARED flag. Each added device will freeze all queues >>> added before it in sequence, which involves waiting for an RCU grace >>> period for each one. We don't need to do this. After the second queue >>> is added, only new queues need to be initialized with the shared tag. >>> We can do that by percolating the flag up to the blk_mq_tag_set, since >>> new hctxs already inherit the tag_set's flags. That's what the below >>> patch does. >> >> Looks the idea is correct. >> >>> >>> The re-check in blk_mq_add_queue_tag_set is done because we could have >>> two queues racing to be the second queue added for the tag set. In such >> >> That isn't possible because of set->tag_list_lock. > > Yes, it is possible. blk_mq_init_allocated_queue calls > blk_mq_init_hw_queues without the set->tag_list_lock. Consider a single > queue is present for the tag_set, and thus the BLK_MQ_F_TAG_SHARED flag > is not set. Then, two more async scan events are running on separate > CPUs. Each will go through this loop in blk_mq_init_hw_queues: > > queue_for_each_hw_ctx(q, hctx, i) { > if (blk_mq_init_hctx(q, set, hctx, i)) > break; > } > > and blk_mq_init_hctx copies the flags: > hctx->flags = set->flags; Looks we should have cleared the TAG_SHARED flag during blk_mq_init_hctx() and just let blk_mq_update_tag_set_depth() deal with that, then the race can be avoided. > > At this point, neither queue's hctxs have the shared flag set. Next, > both will race to get the tag_list_lock for the tag_set inside of > blk_mq_add_queue_tag_set. Only one will win and mark the initial > queue's hctx's as shared (as well as its own). Then, when the second > queue gets the lock, it will find that the shared tag is already set, > and assume that it doesn't have to do anything. But, because its As I suggested, we can set it always in case that TAG_SHARED is set in set->flags because we know the queue isn't ready yet at that time. > hctx->flags were copied before the BLK_MQ_F_TAG_SHARED flag was set, it > now will operate as if it's the only queue in the tag set. The shared flag should be set here because the flag has been set in set->flags, then no such issue, is there? > > So, yes, there is a race, and that extra check is necessary. Hopefully > I've explained it well enough. If not, let me know and I'll try again. > > Cheers, > Jeff > >> >>> a case, it's possible that not all of the hctxs within the loser will be >>> initialized. Because this initialization is done sequentially, we can >>> simply check for the BLK_MQ_F_TAG_SHARED flag in the first hctx. >>> >>> This problem was introduced by commit 0d2602ca30e41 (blk-mq: improve >>> support for shared tags maps). >>> >>> Reported-and-tested-by: Jason Luo <zhangqing.luo@...cle.com> >>> Signed-off-by: Jeff Moyer <jmoyer@...hat.com> >>> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index 7785ae9..8b4c484 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -1860,27 +1860,26 @@ static void blk_mq_map_swqueue(struct request_queue *q, >>> } >>> } >>> >>> -static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set) >>> +static void queue_set_hctx_shared(struct request_queue *q, bool shared) >>> { >>> struct blk_mq_hw_ctx *hctx; >>> - struct request_queue *q; >>> - bool shared; >>> int i; >>> >>> - if (set->tag_list.next == set->tag_list.prev) >>> - shared = false; >>> - else >>> - shared = true; >>> + queue_for_each_hw_ctx(q, hctx, i) { >>> + if (shared) >>> + hctx->flags |= BLK_MQ_F_TAG_SHARED; >>> + else >>> + hctx->flags &= ~BLK_MQ_F_TAG_SHARED; >>> + } >>> +} >>> + >>> +static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set, bool shared) >>> +{ >>> + struct request_queue *q; >>> >>> list_for_each_entry(q, &set->tag_list, tag_set_list) { >>> blk_mq_freeze_queue(q); >>> - >>> - queue_for_each_hw_ctx(q, hctx, i) { >>> - if (shared) >>> - hctx->flags |= BLK_MQ_F_TAG_SHARED; >>> - else >>> - hctx->flags &= ~BLK_MQ_F_TAG_SHARED; >>> - } >>> + queue_set_hctx_shared(q, shared); >>> blk_mq_unfreeze_queue(q); >>> } >>> } >>> @@ -1891,7 +1890,13 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q) >>> >>> mutex_lock(&set->tag_list_lock); >>> list_del_init(&q->tag_set_list); >>> - blk_mq_update_tag_set_depth(set); >>> + >>> + if (set->tag_list.next == set->tag_list.prev) { >>> + /* just transitioned to unshared */ >>> + set->flags &= ~BLK_MQ_F_TAG_SHARED; >>> + /* update existing queue */ >>> + blk_mq_update_tag_set_depth(set, false); >>> + } >>> mutex_unlock(&set->tag_list_lock); >>> } >>> >>> @@ -1902,7 +1907,24 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set, >>> >>> mutex_lock(&set->tag_list_lock); >>> list_add_tail(&q->tag_set_list, &set->tag_list); >>> - blk_mq_update_tag_set_depth(set); >>> + >>> + if (set->tag_list.next != set->tag_list.prev) { >>> + /* >>> + * Only update the tag set state if the state has >>> + * actually changed. >>> + */ >>> + if (!(set->flags & BLK_MQ_F_TAG_SHARED)) { >>> + /* just transitioned to shared tags */ >>> + set->flags |= BLK_MQ_F_TAG_SHARED; >>> + blk_mq_update_tag_set_depth(set, true); >>> + } else { >>> + /* ensure we didn't race with another addition */ >>> + struct blk_mq_hw_ctx *hctx = queue_first_hw_ctx(q); >>> + if ((hctx->flags & BLK_MQ_F_TAG_SHARED) != >>> + BLK_MQ_F_TAG_SHARED) >>> + queue_set_hctx_shared(q, true); >> >> The above check isn't necessary because the queue can't be up now, so >> it is OK to just call 'queue_set_hctx_shared(q, true)' directly. >> >>> + } >>> + } >>> mutex_unlock(&set->tag_list_lock); >>> } >>> >>> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h >>> index 5e7d43a..12ffc40 100644 >>> --- a/include/linux/blk-mq.h >>> +++ b/include/linux/blk-mq.h >>> @@ -254,6 +254,9 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq) >>> for ((i) = 0; (i) < (hctx)->nr_ctx && \ >>> ({ ctx = (hctx)->ctxs[(i)]; 1; }); (i)++) >>> >>> +#define queue_first_hw_ctx(q) \ >>> + (q)->queue_hw_ctx[0] >>> + >>> #define blk_ctx_sum(q, sum) \ >>> ({ \ >>> struct blk_mq_ctx *__x; \ -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists