linux-kernel - Re: [PATCH v3 1/5] blk-mq-sched: introduce high level elevator lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <70789114-81ad-1226-c99c-b35e152b7769@huaweicloud.com>
Date: Mon, 11 Aug 2025 09:01:37 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Damien Le Moal <dlemoal@...nel.org>, Yu Kuai <yukuai1@...weicloud.com>,
 hare@...e.de, jack@...e.cz, bvanassche@....org, tj@...nel.org,
 josef@...icpanda.com, axboe@...nel.dk
Cc: cgroups@...r.kernel.org, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
 johnny.chenyi@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH v3 1/5] blk-mq-sched: introduce high level elevator lock

Hi,

在 2025/08/11 8:44, Damien Le Moal 写道:
> On 8/6/25 17:57, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@...wei.com>
>>
>> Currently, both mq-deadline and bfq have global spin lock that will be
>> grabbed inside elevator methods like dispatch_request, insert_requests,
>> and bio_merge. And the global lock is the main reason mq-deadline and
>> bfq can't scale very well.
>>
>> While dispatching request, blk_mq_get_disatpch_budget() and
>> blk_mq_get_driver_tag() must be called, and they are not ready to be called
>> inside elevator methods, hence introduce a new method like
>> dispatch_requests is not possible.
>>
>> Hence introduce a new high level elevator lock, currently it is protecting
>> dispatch_request only. Following patches will convert mq-deadline and bfq
>> to use this lock and finally support request batch dispatching by calling
>> the method multiple time while holding the lock.
>>
>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>> ---
>>   block/blk-mq-sched.c |  9 ++++++++-
>>   block/elevator.c     |  1 +
>>   block/elevator.h     | 14 ++++++++++++--
>>   3 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index 55a0fd105147..1a2da5edbe13 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -113,7 +113,14 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>>   		if (budget_token < 0)
>>   			break;
>>   
>> -		rq = e->type->ops.dispatch_request(hctx);
>> +		if (blk_queue_sq_sched(q)) {
>> +			elevator_lock(e);
>> +			rq = e->type->ops.dispatch_request(hctx);
>> +			elevator_unlock(e);
> 
> I do not think this is safe for bfq since bfq uses the irqsave/irqrestore spin
> lock variant. If it is safe, this needs a big comment block explaining why
> and/or the rules regarding the scheduler use of this lock.

It's correct, however, this patch doesn't change bfq yet, and it's like:

elevator_lock
spin_lock_irq(&bfqd->lock)
spin_unlock_irq(&bfqd->lock)
elevator_unlock

Patch 3 remove bfqd->lock and convert this to:

elevator_lock_irq
elevator_unlock_irq.

Thanks,
Kuai

> 
>> +		} else {
>> +			rq = e->type->ops.dispatch_request(hctx);
>> +		}
>> +
>>   		if (!rq) {
>>   			blk_mq_put_dispatch_budget(q, budget_token);
>>   			/*
>> diff --git a/block/elevator.c b/block/elevator.c
>> index 88f8f36bed98..45303af0ca73 100644
>> --- a/block/elevator.c
>> +++ b/block/elevator.c
>> @@ -144,6 +144,7 @@ struct elevator_queue *elevator_alloc(struct request_queue *q,
>>   	eq->type = e;
>>   	kobject_init(&eq->kobj, &elv_ktype);
>>   	mutex_init(&eq->sysfs_lock);
>> +	spin_lock_init(&eq->lock);
>>   	hash_init(eq->hash);
>>   
>>   	return eq;
>> diff --git a/block/elevator.h b/block/elevator.h
>> index a07ce773a38f..81f7700b0339 100644
>> --- a/block/elevator.h
>> +++ b/block/elevator.h
>> @@ -110,12 +110,12 @@ struct request *elv_rqhash_find(struct request_queue *q, sector_t offset);
>>   /*
>>    * each queue has an elevator_queue associated with it
>>    */
>> -struct elevator_queue
>> -{
>> +struct elevator_queue {
>>   	struct elevator_type *type;
>>   	void *elevator_data;
>>   	struct kobject kobj;
>>   	struct mutex sysfs_lock;
>> +	spinlock_t lock;
>>   	unsigned long flags;
>>   	DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
>>   };
>> @@ -124,6 +124,16 @@ struct elevator_queue
>>   #define ELEVATOR_FLAG_DYING		1
>>   #define ELEVATOR_FLAG_ENABLE_WBT_ON_EXIT	2
>>   
>> +#define elevator_lock(e)		spin_lock(&(e)->lock)
>> +#define elevator_unlock(e)		spin_unlock(&(e)->lock)
>> +#define elevator_lock_irq(e)		spin_lock_irq(&(e)->lock)
>> +#define elevator_unlock_irq(e)		spin_unlock_irq(&(e)->lock)
>> +#define elevator_lock_irqsave(e, flags) \
>> +	spin_lock_irqsave(&(e)->lock, flags)
>> +#define elevator_unlock_irqrestore(e, flags) \
>> +	spin_unlock_irqrestore(&(e)->lock, flags)
>> +#define elevator_lock_assert_held(e)	lockdep_assert_held(&(e)->lock)
>> +
>>   /*
>>    * block elevator interface
>>    */
> 
>