[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d32d44ef-a964-430a-a735-f63f2fa5e7ed@kernel.org>
Date: Wed, 23 Jul 2025 11:42:17 +0900
From: Damien Le Moal <dlemoal@...nel.org>
To: Yu Kuai <yukuai1@...weicloud.com>, hare@...e.de, tj@...nel.org,
josef@...icpanda.com, axboe@...nel.dk
Cc: cgroups@...r.kernel.org, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
johnny.chenyi@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH 4/6] elevator: factor elevator lock out of
dispatch_request method
On 7/23/25 11:17 AM, Yu Kuai wrote:
> Hi,
>
> 在 2025/07/23 9:59, Damien Le Moal 写道:
>> On 7/22/25 4:24 PM, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@...wei.com>
>>>
>>> Currently, both mq-deadline and bfq have global spin lock that will be
>>> grabbed inside elevator methods like dispatch_request, insert_requests,
>>> and bio_merge. And the global lock is the main reason mq-deadline and
>>> bfq can't scale very well.
>>>
>>> For dispatch_request method, current behavior is dispatching one request at
>>
>> s/current/the current
>>
>>> a time. In the case of multiple dispatching contexts, this behavior will
>>> cause huge lock contention and messing up the requests dispatching
>>
>> s/messing up/change
>>
>>> order. And folloiwng patches will support requests batch dispatching to
>>
>> s/folloiwng/following
>>
>>> fix thoses problems.
>>>
>>> While dispatching request, blk_mq_get_disatpch_budget() and
>>> blk_mq_get_driver_tag() must be called, and they are not ready to be
>>> called inside elevator methods, hence introduce a new method like
>>> dispatch_requests is not possible.
>>>
>>> In conclusion, this patch factor the global lock out of dispatch_request
>>> method, and following patches will support request batch dispatch by
>>> calling the methods multiple time while holding the lock.
>>
>> You are creating a bisect problem here. This patch breaks the schedulers
>> dispatch atomicity without the changes to the calls to the elevator methods in
>> the block layer.
>
> I'm not sure why there will be bisect problem, I think git checkout to
> any patch in this set should work just fine. Can you please explain a
> bit more?
If you apply this patch, stop here without applying the following patches, and
test the changes up to this point, things will break since there is no locking
during dispatch.
So you need to organize the patches so that you first have the elevator level
common locking in place and then have one patch for bfq and one patch for
mq-deadline that switch to using that new lock. Hence the suggestion to reverse
the order of your changes: change the block layer first, then have bfq and
mq-deadline use that new locking.
>>
>> So maybe reorganize these patches to have the block layer changes first, and
>> move patch 1 and 3 after these to switch mq-deadline and bfq to using the
>> higher level lock correctly, removing the locking from bfq_dispatch_request()
>> and dd_dispatch_request().
>
> Sure, I can to the reorganize.
>
> Thanks,
> Kuai
>
>>
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>>> ---
>>> block/bfq-iosched.c | 3 ---
>>> block/blk-mq-sched.c | 6 ++++++
>>> block/mq-deadline.c | 5 +----
>>> 3 files changed, 7 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>> index 11b81b11242c..9f8a256e43f2 100644
>>> --- a/block/bfq-iosched.c
>>> +++ b/block/bfq-iosched.c
>>> @@ -5307,8 +5307,6 @@ static struct request *bfq_dispatch_request(struct
>>> blk_mq_hw_ctx *hctx)
>>> struct bfq_queue *in_serv_queue;
>>> bool waiting_rq, idle_timer_disabled = false;
>>> - spin_lock_irq(bfqd->lock);
>>> -
>>> in_serv_queue = bfqd->in_service_queue;
>>> waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
>>> @@ -5318,7 +5316,6 @@ static struct request *bfq_dispatch_request(struct
>>> blk_mq_hw_ctx *hctx)
>>> waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
>>> }
>>> - spin_unlock_irq(bfqd->lock);
>>> bfq_update_dispatch_stats(hctx->queue, rq,
>>> idle_timer_disabled ? in_serv_queue : NULL,
>>> idle_timer_disabled);
>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>> index 55a0fd105147..82c4f4eef9ed 100644
>>> --- a/block/blk-mq-sched.c
>>> +++ b/block/blk-mq-sched.c
>>> @@ -98,6 +98,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx
>>> *hctx)
>>> max_dispatch = hctx->queue->nr_requests;
>>> do {
>>> + bool sq_sched = blk_queue_sq_sched(q);
>>> struct request *rq;
>>> int budget_token;
>>> @@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct
>>> blk_mq_hw_ctx *hctx)
>>> if (budget_token < 0)
>>> break;
>>> + if (sq_sched)
>>> + spin_lock_irq(&e->lock);
>>> rq = e->type->ops.dispatch_request(hctx);
>>> + if (sq_sched)
>>> + spin_unlock_irq(&e->lock);
>>> +
>>> if (!rq) {
>>> blk_mq_put_dispatch_budget(q, budget_token);
>>> /*
>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>>> index e31da6de7764..a008e41bc861 100644
>>> --- a/block/mq-deadline.c
>>> +++ b/block/mq-deadline.c
>>> @@ -466,10 +466,9 @@ static struct request *dd_dispatch_request(struct
>>> blk_mq_hw_ctx *hctx)
>>> struct request *rq;
>>> enum dd_prio prio;
>>> - spin_lock(dd->lock);
>>> rq = dd_dispatch_prio_aged_requests(dd, now);
>>> if (rq)
>>> - goto unlock;
>>> + return rq;
>>> /*
>>> * Next, dispatch requests in priority order. Ignore lower priority
>>> @@ -481,8 +480,6 @@ static struct request *dd_dispatch_request(struct
>>> blk_mq_hw_ctx *hctx)
>>> break;
>>> }
>>> -unlock:
>>> - spin_unlock(dd->lock);
>>> return rq;
>>> }
>>>
>>
>>
>
--
Damien Le Moal
Western Digital Research
Powered by blists - more mailing lists