linux-kernel - Re: [PATCH 4/6] elevator: factor elevator lock out of dispatch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2b48b0eb-7294-c4e1-8b84-ce2e860f3a75@huaweicloud.com>
Date: Wed, 23 Jul 2025 10:51:39 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: Damien Le Moal <dlemoal@...nel.org>, Yu Kuai <yukuai1@...weicloud.com>,
 hare@...e.de, tj@...nel.org, josef@...icpanda.com, axboe@...nel.dk
Cc: cgroups@...r.kernel.org, linux-block@...r.kernel.org,
 linux-kernel@...r.kernel.org, yi.zhang@...wei.com, yangerkun@...wei.com,
 johnny.chenyi@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH 4/6] elevator: factor elevator lock out of
 dispatch_request method

Hi,

在 2025/07/23 10:42, Damien Le Moal 写道:
> On 7/23/25 11:17 AM, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/07/23 9:59, Damien Le Moal 写道:
>>> On 7/22/25 4:24 PM, Yu Kuai wrote:
>>>> From: Yu Kuai <yukuai3@...wei.com>
>>>>
>>>> Currently, both mq-deadline and bfq have global spin lock that will be
>>>> grabbed inside elevator methods like dispatch_request, insert_requests,
>>>> and bio_merge. And the global lock is the main reason mq-deadline and
>>>> bfq can't scale very well.
>>>>
>>>> For dispatch_request method, current behavior is dispatching one request at
>>>
>>> s/current/the current
>>>
>>>> a time. In the case of multiple dispatching contexts, this behavior will
>>>> cause huge lock contention and messing up the requests dispatching
>>>
>>> s/messing up/change
>>>
>>>> order. And folloiwng patches will support requests batch dispatching to
>>>
>>> s/folloiwng/following
>>>
>>>> fix thoses problems.
>>>>
>>>> While dispatching request, blk_mq_get_disatpch_budget() and
>>>> blk_mq_get_driver_tag() must be called, and they are not ready to be
>>>> called inside elevator methods, hence introduce a new method like
>>>> dispatch_requests is not possible.
>>>>
>>>> In conclusion, this patch factor the global lock out of dispatch_request
>>>> method, and following patches will support request batch dispatch by
>>>> calling the methods multiple time while holding the lock.
>>>
>>> You are creating a bisect problem here. This patch breaks the schedulers
>>> dispatch atomicity without the changes to the calls to the elevator methods in
>>> the block layer.
>>
>> I'm not sure why there will be bisect problem, I think git checkout to
>> any patch in this set should work just fine. Can you please explain a
>> bit more?
> 
> If you apply this patch, stop here without applying the following patches, and
> test the changes up to this point, things will break since there is no locking
> during dispatch.

Do you missed the following change in this patch? Dispatch do switch to
the new lock, I don't get it why there is no locking.

@@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct 
blk_mq_hw_ctx *hctx)
  		if (budget_token < 0)
  			break;

+		if (sq_sched)
+			spin_lock_irq(&e->lock);
  		rq = e->type->ops.dispatch_request(hctx);
+		if (sq_sched)
+			spin_unlock_irq(&e->lock);
+
  		if (!rq) {
  			blk_mq_put_dispatch_budget(q, budget_token);
  			/*
> 
> So you need to organize the patches so that you first have the elevator level
> common locking in place and then have one patch for bfq and one patch for
> mq-deadline that switch to using that new lock. Hence the suggestion to reverse
> the order of your changes: change the block layer first, then have bfq and
> mq-deadline use that new locking.

I think I understand what you mean, just to be sure.

1. patch 5 in this set
2. patch to introduce high level lock, and grab it during dispatch in 
block layer.
3. changes in ioc
4. changes in bfq
5. changes in deadline
6. patch 6 in this set.

Thanks,
Kuai

> 
>>>
>>> So maybe reorganize these patches to have the block layer changes first, and
>>> move patch 1 and 3 after these to switch mq-deadline and bfq to using the
>>> higher level lock correctly, removing the locking from bfq_dispatch_request()
>>> and dd_dispatch_request().
>>
>> Sure, I can to the reorganize.
>>
>> Thanks,
>> Kuai
>>
>>>
>>>>
>>>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>>>> ---
>>>>    block/bfq-iosched.c  | 3 ---
>>>>    block/blk-mq-sched.c | 6 ++++++
>>>>    block/mq-deadline.c  | 5 +----
>>>>    3 files changed, 7 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>>> index 11b81b11242c..9f8a256e43f2 100644
>>>> --- a/block/bfq-iosched.c
>>>> +++ b/block/bfq-iosched.c
>>>> @@ -5307,8 +5307,6 @@ static struct request *bfq_dispatch_request(struct
>>>> blk_mq_hw_ctx *hctx)
>>>>        struct bfq_queue *in_serv_queue;
>>>>        bool waiting_rq, idle_timer_disabled = false;
>>>>    -    spin_lock_irq(bfqd->lock);
>>>> -
>>>>        in_serv_queue = bfqd->in_service_queue;
>>>>        waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
>>>>    @@ -5318,7 +5316,6 @@ static struct request *bfq_dispatch_request(struct
>>>> blk_mq_hw_ctx *hctx)
>>>>                waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
>>>>        }
>>>>    -    spin_unlock_irq(bfqd->lock);
>>>>        bfq_update_dispatch_stats(hctx->queue, rq,
>>>>                idle_timer_disabled ? in_serv_queue : NULL,
>>>>                    idle_timer_disabled);
>>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>>> index 55a0fd105147..82c4f4eef9ed 100644
>>>> --- a/block/blk-mq-sched.c
>>>> +++ b/block/blk-mq-sched.c
>>>> @@ -98,6 +98,7 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx
>>>> *hctx)
>>>>            max_dispatch = hctx->queue->nr_requests;
>>>>          do {
>>>> +        bool sq_sched = blk_queue_sq_sched(q);
>>>>            struct request *rq;
>>>>            int budget_token;
>>>>    @@ -113,7 +114,12 @@ static int __blk_mq_do_dispatch_sched(struct
>>>> blk_mq_hw_ctx *hctx)
>>>>            if (budget_token < 0)
>>>>                break;
>>>>    +        if (sq_sched)
>>>> +            spin_lock_irq(&e->lock);
>>>>            rq = e->type->ops.dispatch_request(hctx);
>>>> +        if (sq_sched)
>>>> +            spin_unlock_irq(&e->lock);
>>>> +
>>>>            if (!rq) {
>>>>                blk_mq_put_dispatch_budget(q, budget_token);
>>>>                /*
>>>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>>>> index e31da6de7764..a008e41bc861 100644
>>>> --- a/block/mq-deadline.c
>>>> +++ b/block/mq-deadline.c
>>>> @@ -466,10 +466,9 @@ static struct request *dd_dispatch_request(struct
>>>> blk_mq_hw_ctx *hctx)
>>>>        struct request *rq;
>>>>        enum dd_prio prio;
>>>>    -    spin_lock(dd->lock);
>>>>        rq = dd_dispatch_prio_aged_requests(dd, now);
>>>>        if (rq)
>>>> -        goto unlock;
>>>> +        return rq;
>>>>          /*
>>>>         * Next, dispatch requests in priority order. Ignore lower priority
>>>> @@ -481,8 +480,6 @@ static struct request *dd_dispatch_request(struct
>>>> blk_mq_hw_ctx *hctx)
>>>>                break;
>>>>        }
>>>>    -unlock:
>>>> -    spin_unlock(dd->lock);
>>>>        return rq;
>>>>    }
>>>>    
>>>
>>>
>>
> 
>