linux-kernel - Re: [PATCH 4/6] elevator: factor elevator lock out of dispatch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cc6f72cb-3782-4426-57c2-4d54fc4f38f2@huaweicloud.com>
Date: Wed, 23 Jul 2025 10:17:16 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Damien Le Moal <dlemoal@...nel.org>, Yu Kuai <yukuai1@...weicloud.com>,
 hare@...e.de, tj@...nel.org, josef@...icpanda.com, axboe@...nel.dk
Cc: cgroups@...r.kernel.org, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
 johnny.chenyi@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH 4/6] elevator: factor elevator lock out of
 dispatch_request method

Hi,

在 2025/07/23 9:59, Damien Le Moal 写道:
> On 7/22/25 4:24 PM, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@...wei.com>
>>
>> Currently, both mq-deadline and bfq have global spin lock that will be
>> grabbed inside elevator methods like dispatch_request, insert_requests,
>> and bio_merge. And the global lock is the main reason mq-deadline and
>> bfq can't scale very well.
>>
>> For dispatch_request method, current behavior is dispatching one request at
> 
> s/current/the current
> 
>> a time. In the case of multiple dispatching contexts, this behavior will
>> cause huge lock contention and messing up the requests dispatching
> 
> s/messing up/change
> 
>> order. And folloiwng patches will support requests batch dispatching to
> 
> s/folloiwng/following
> 
>> fix thoses problems.
>>
>> While dispatching request, blk_mq_get_disatpch_budget() and
>> blk_mq_get_driver_tag() must be called, and they are not ready to be
>> called inside elevator methods, hence introduce a new method like
>> dispatch_requests is not possible.
>>
>> In conclusion, this patch factor the global lock out of dispatch_request
>> method, and following patches will support request batch dispatch by
>> calling the methods multiple time while holding the lock.
> 
> You are creating a bisect problem here. This patch breaks the schedulers
> dispatch atomicity without the changes to the calls to the elevator methods in
> the block layer.

I'm not sure why there will be bisect problem, I think git checkout to
any patch in this set should work just fine. Can you please explain a
bit more?
> 
> So maybe reorganize these patches to have the block layer changes first, and
> move patch 1 and 3 after these to switch mq-deadline and bfq to using the
> higher level lock correctly, removing the locking from bfq_dispatch_request()
> and dd_dispatch_request().

Sure, I can to the reorganize.

Thanks,
Kuai

> 
>>
>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>> ---
>>   block/bfq-iosched.c  | 3 ---
>>   block/blk-mq-sched.c | 6 ++++++
>>   block/mq-deadline.c  | 5 +----
>>   3 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index 11b81b11242c..9f8a256e43f2 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -5307,8 +5307,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>   	struct bfq_queue *in_serv_queue;
>>   	bool waiting_rq, idle_timer_disabled = false;
>>   
>> -	spin_lock_irq(bfqd->lock);
>> -
>>   	in_serv_queue = bfqd->in_service_queue;
>>   	waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
>>   
>> @@ -5318,7 +5316,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>   			waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
>>   	}
>>   
>> -	spin_unlock_irq(bfqd->lock);
>>   	bfq_update_dispatch_stats(hctx->queue, rq,
>>   			idle_timer_disabled ? in_serv_queue : NULL,
>>   				idle_timer_disabled);
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index 55a0fd105147..82c4f4eef9ed 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -98,6 +98,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>>   		max_dispatch = hctx->queue->nr_requests;
>>   
>>   	do {
>> +		bool sq_sched = blk_queue_sq_sched(q);
>>   		struct request *rq;
>>   		int budget_token;
>>   
>> @@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>>   		if (budget_token < 0)
>>   			break;
>>   
>> +		if (sq_sched)
>> +			spin_lock_irq(&e->lock);
>>   		rq = e->type->ops.dispatch_request(hctx);
>> +		if (sq_sched)
>> +			spin_unlock_irq(&e->lock);
>> +
>>   		if (!rq) {
>>   			blk_mq_put_dispatch_budget(q, budget_token);
>>   			/*
>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>> index e31da6de7764..a008e41bc861 100644
>> --- a/block/mq-deadline.c
>> +++ b/block/mq-deadline.c
>> @@ -466,10 +466,9 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>   	struct request *rq;
>>   	enum dd_prio prio;
>>   
>> -	spin_lock(dd->lock);
>>   	rq = dd_dispatch_prio_aged_requests(dd, now);
>>   	if (rq)
>> -		goto unlock;
>> +		return rq;
>>   
>>   	/*
>>   	 * Next, dispatch requests in priority order. Ignore lower priority
>> @@ -481,8 +480,6 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>   			break;
>>   	}
>>   
>> -unlock:
>> -	spin_unlock(dd->lock);
>>   	return rq;
>>   }
>>   
> 
>