[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0d36c16b-cb4b-6088-fdf3-2fe5d8f33cd7@oracle.com>
Date: Tue, 16 Jan 2018 18:12:09 +0800
From: "jianchao.wang" <jianchao.w.wang@...cle.com>
To: Ming Lei <ming.lei@...hat.com>, Jens Axboe <axboe@...com>,
linux-block@...r.kernel.org, Christoph Hellwig <hch@...radead.org>
Cc: Christian Borntraeger <borntraeger@...ibm.com>,
Stefan Haberland <sth@...ux.vnet.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, Christoph Hellwig <hch@....de>
Subject: Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each
possisble CPU
Hi Ming
On 01/12/2018 10:53 AM, Ming Lei wrote:
> From: Christoph Hellwig <hch@....de>
>
> The previous patch assigns interrupt vectors to all possible CPUs, so
> now hctx can be mapped to possible CPUs, this patch applies this fact
> to simplify queue mapping & schedule so that we don't need to handle
> CPU hotplug for dealing with physical CPU plug & unplug. With this
> simplication, we can work well on physical CPU plug & unplug, which
> is a normal use case for VM at least.
>
> Make sure we allocate blk_mq_ctx structures for all possible CPUs, and
> set hctx->numa_node for possible CPUs which are mapped to this hctx. And
> only choose the online CPUs for schedule.
>
> Reported-by: Christian Borntraeger <borntraeger@...ibm.com>
> Tested-by: Christian Borntraeger <borntraeger@...ibm.com>
> Tested-by: Stefan Haberland <sth@...ux.vnet.ibm.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Signed-off-by: Christoph Hellwig <hch@....de>
> (merged the three into one because any single one may not work, and fix
> selecting online CPUs for scheduler)
> Signed-off-by: Ming Lei <ming.lei@...hat.com>
> ---
> block/blk-mq.c | 19 ++++++++-----------
> 1 file changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8000ba6db07d..ef9beca2d117 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -440,7 +440,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
> blk_queue_exit(q);
> return ERR_PTR(-EXDEV);
> }
> - cpu = cpumask_first(alloc_data.hctx->cpumask);
> + cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
> alloc_data.ctx = __blk_mq_get_ctx(q, cpu);
>
> rq = blk_mq_get_request(q, NULL, op, &alloc_data);
> @@ -1323,9 +1323,10 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
> if (--hctx->next_cpu_batch <= 0) {
> int next_cpu;
>
> - next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask);
> + next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask,
> + cpu_online_mask);
> if (next_cpu >= nr_cpu_ids)
> - next_cpu = cpumask_first(hctx->cpumask);
> + next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask);
the next_cpu here could be >= nr_cpu_ids when the none of on hctx->cpumask is online.
This could be reproduced on NVMe with a patch that could hold some rqs on ctx->rq_list,
meanwhile a script online and offline the cpus. Then a panic occurred in __queue_work().
maybe cpu_possible_mask here, the workers in the pool of the offlined cpu has been unbound.
It should be ok to queue on them.
Thanks
Jianchao
> hctx->next_cpu = next_cpu;
> hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
> @@ -2219,16 +2220,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
> INIT_LIST_HEAD(&__ctx->rq_list);
> __ctx->queue = q;
>
> - /* If the cpu isn't present, the cpu is mapped to first hctx */
> - if (!cpu_present(i))
> - continue;
> -
> - hctx = blk_mq_map_queue(q, i);
> -
> /*
> * Set local node, IFF we have more than one hw queue. If
> * not, we remain on the home node of the device
> */
> + hctx = blk_mq_map_queue(q, i);
> if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE)
> hctx->numa_node = local_memory_node(cpu_to_node(i));
> }
> @@ -2285,7 +2281,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> *
> * If the cpu isn't present, the cpu is mapped to first hctx.
> */
> - for_each_present_cpu(i) {
> + for_each_possible_cpu(i) {
> hctx_idx = q->mq_map[i];
> /* unmapped hw queue can be remapped after CPU topo changed */
> if (!set->tags[hctx_idx] &&
> @@ -2339,7 +2335,8 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> /*
> * Initialize batch roundrobin counts
> */
> - hctx->next_cpu = cpumask_first(hctx->cpumask);
> + hctx->next_cpu = cpumask_first_and(hctx->cpumask,
> + cpu_online_mask);
> hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
> }
> }
>
Powered by blists - more mailing lists