[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e142559-1c96-8d84-081a-378c1f6d1306@huawei.com>
Date: Wed, 13 Oct 2021 16:13:11 +0100
From: John Garry <john.garry@...wei.com>
To: Ming Lei <ming.lei@...hat.com>
CC: "axboe@...nel.dk" <axboe@...nel.dk>,
"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kashyap.desai@...adcom.com" <kashyap.desai@...adcom.com>,
"hare@...e.de" <hare@...e.de>
Subject: Re: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags
On 13/10/2021 15:29, Ming Lei wrote:
>> As I understand, Kashyap mentioned no throughput regression with my series,
>> but just higher cpu usage in blk_mq_find_and_get_req().
>>
>> I'll see if I can see such a thing in my setup.
>>
>> But could it be that since we only have a single sets of requests per
>> tagset, and not a set of requests per HW queue, there is more contention on
>> the common set of requests in the refcount_inc_not_zero() call ***, below:
>>
>> static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags,
>> unsigned int bitnr)
>> {
>> ...
>>
>> rq = tags->rqs[bitnr];
>> if (... || !refcount_inc_not_zero(&rq->ref)) ***
>> ...
>> }
> Kashyap's log shows that contention on tags->lock is increased, that
> should be caused by nr_hw_queues iterating.
If the lock contention increases on tags->lock then I am not totally
surprised. For shared sbitmap, each HW queue had its own tags (and tags
lock). Now with shared tags, we have a single lock over the tagset, and
so we would have more contention. That's on the basis that we have many
parallel callers to blk_mq_queue_tag_busy_iter().
> blk_mq_find_and_get_req()
> will be run nr_hw_queue times compared with pre-shared-sbitmap, since it
> is done before checking rq->mq_hctx.
Isn't shared sitmap older than blk_mq_find_and_get_req()?
Anyway, for 5.14 shared sbitmap support, we iter nr_hw_queue times. And
now, for shared tags, we still do that. I don't see what's changed in
that regard.
>
>> But I wonder why this function is even called often...
>>
>>>> There is also blk_mq_all_tag_iter():
>>>>
>>>> void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
>>>> void *priv)
>>>> {
>>>> __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS);
>>>> }
>>>>
>>>> But then the only user is blk_mq_hctx_has_requests():
>>>>
>>>> static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
>>>> {
>>>> struct blk_mq_tags *tags = hctx->sched_tags ?
>>>> hctx->sched_tags : hctx->tags;
>>>> struct rq_iter_data data = {
>>>> .hctx = hctx,
>>>> };
>>>>
>>>> blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
>>>> return data.has_rq;
>>>> }
>>> This above one only iterates over the specified hctx/tags, it won't be
>>> affected.
>>>
>>>> But, again like bt_iter(), blk_mq_has_request() will check the hctx matches:
>>> Not see what matters wrt. checking hctx.
>> I'm just saying that something like the following would be broken for shared
>> tags:
>>
>> static bool blk_mq_has_request(struct request *rq, void *data, bool
>> reserved)
>> {
>> struct rq_iter_data *iter_data = data;
>>
>> iter_data->has_rq = true;
>> return true;
>> }
>>
>> static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx)
>> {
>> struct rq_iter_data data = {
>> };
>>
>> blk_mq_all_tag_iter(tags, blk_mq_has_request, &data);
>> return data.has_rq;
>> }
>>
>> As it ignores that we want to check for a specific hctx.
> No, that isn't what I meant, follows the change I suggested:
I didn't mean that this was your suggestion. I am just saying that we
need to be careful iter'ing tags for shared tags now, as in that example.
>
>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 72a2724a4eee..2a2ad6dfcc33 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -232,8 +232,9 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> if (!rq)
> return true;
>
> - if (rq->q == hctx->queue && rq->mq_hctx == hctx)
> - ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
> + if (rq->q == hctx->queue && (rq->mq_hctx == hctx ||
> + blk_mq_is_shared_tags(hctx->flags)))
> + ret = iter_data->fn(rq->mq_hctx, rq, iter_data->data, reserved);
> blk_mq_put_rq_ref(rq);
> return ret;
> }
> @@ -460,6 +461,9 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
> if (tags->nr_reserved_tags)
> bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
> bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
> +
> + if (blk_mq_is_shared_tags(hctx->flags))
> + break;
> }
> blk_queue_exit(q);
> }
>
I suppose that is ok, and means that we iter once.
However, I have to ask, where is the big user of
blk_mq_queue_tag_busy_iter() coming from? I saw this from Kashyap's mail:
> 1.31% 1.31% kworker/57:1H-k [kernel.vmlinux]
> native_queued_spin_lock_slowpath
> ret_from_fork
> kthread
> worker_thread
> process_one_work
> blk_mq_timeout_work
> blk_mq_queue_tag_busy_iter
> bt_iter
> blk_mq_find_and_get_req
> _raw_spin_lock_irqsave
> native_queued_spin_lock_slowpath
How or why blk_mq_timeout_work()?
Thanks,
john
Powered by blists - more mailing lists