linux-kernel - Re: [PATCH 4/6] elevator: factor elevator lock out of dispatch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <08c989bd-20d8-476c-af99-c9eb8065349d@kernel.org>
Date: Wed, 23 Jul 2025 10:59:18 +0900
From: Damien Le Moal <dlemoal@...nel.org>
To: Yu Kuai <yukuai1@...weicloud.com>, hare@...e.de, tj@...nel.org,
 josef@...icpanda.com, axboe@...nel.dk, yukuai3@...wei.com
Cc: cgroups@...r.kernel.org, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
 johnny.chenyi@...wei.com
Subject: Re: [PATCH 4/6] elevator: factor elevator lock out of
 dispatch_request method

On 7/22/25 4:24 PM, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@...wei.com>
> 
> Currently, both mq-deadline and bfq have global spin lock that will be
> grabbed inside elevator methods like dispatch_request, insert_requests,
> and bio_merge. And the global lock is the main reason mq-deadline and
> bfq can't scale very well.
> 
> For dispatch_request method, current behavior is dispatching one request at

s/current/the current

> a time. In the case of multiple dispatching contexts, this behavior will
> cause huge lock contention and messing up the requests dispatching

s/messing up/change

> order. And folloiwng patches will support requests batch dispatching to

s/folloiwng/following

> fix thoses problems.
> 
> While dispatching request, blk_mq_get_disatpch_budget() and
> blk_mq_get_driver_tag() must be called, and they are not ready to be
> called inside elevator methods, hence introduce a new method like
> dispatch_requests is not possible.
> 
> In conclusion, this patch factor the global lock out of dispatch_request
> method, and following patches will support request batch dispatch by
> calling the methods multiple time while holding the lock.

You are creating a bisect problem here. This patch breaks the schedulers
dispatch atomicity without the changes to the calls to the elevator methods in
the block layer.

So maybe reorganize these patches to have the block layer changes first, and
move patch 1 and 3 after these to switch mq-deadline and bfq to using the
higher level lock correctly, removing the locking from bfq_dispatch_request()
and dd_dispatch_request().

> 
> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> ---
>  block/bfq-iosched.c  | 3 ---
>  block/blk-mq-sched.c | 6 ++++++
>  block/mq-deadline.c  | 5 +----
>  3 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 11b81b11242c..9f8a256e43f2 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -5307,8 +5307,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  	struct bfq_queue *in_serv_queue;
>  	bool waiting_rq, idle_timer_disabled = false;
>  
> -	spin_lock_irq(bfqd->lock);
> -
>  	in_serv_queue = bfqd->in_service_queue;
>  	waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
>  
> @@ -5318,7 +5316,6 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  			waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
>  	}
>  
> -	spin_unlock_irq(bfqd->lock);
>  	bfq_update_dispatch_stats(hctx->queue, rq,
>  			idle_timer_disabled ? in_serv_queue : NULL,
>  				idle_timer_disabled);
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55a0fd105147..82c4f4eef9ed 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -98,6 +98,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>  		max_dispatch = hctx->queue->nr_requests;
>  
>  	do {
> +		bool sq_sched = blk_queue_sq_sched(q);
>  		struct request *rq;
>  		int budget_token;
>  
> @@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>  		if (budget_token < 0)
>  			break;
>  
> +		if (sq_sched)
> +			spin_lock_irq(&e->lock);
>  		rq = e->type->ops.dispatch_request(hctx);
> +		if (sq_sched)
> +			spin_unlock_irq(&e->lock);
> +
>  		if (!rq) {
>  			blk_mq_put_dispatch_budget(q, budget_token);
>  			/*
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index e31da6de7764..a008e41bc861 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -466,10 +466,9 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  	struct request *rq;
>  	enum dd_prio prio;
>  
> -	spin_lock(dd->lock);
>  	rq = dd_dispatch_prio_aged_requests(dd, now);
>  	if (rq)
> -		goto unlock;
> +		return rq;
>  
>  	/*
>  	 * Next, dispatch requests in priority order. Ignore lower priority
> @@ -481,8 +480,6 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  			break;
>  	}
>  
> -unlock:
> -	spin_unlock(dd->lock);
>  	return rq;
>  }
>  


-- 
Damien Le Moal
Western Digital Research