[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250722072431.610354-1-yukuai1@huaweicloud.com>
Date: Tue, 22 Jul 2025 15:24:25 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: dlemoal@...nel.org,
hare@...e.de,
tj@...nel.org,
josef@...icpanda.com,
axboe@...nel.dk,
yukuai3@...wei.com
Cc: cgroups@...r.kernel.org,
linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org,
yukuai1@...weicloud.com,
yi.zhang@...wei.com,
yangerkun@...wei.com,
johnny.chenyi@...wei.com
Subject: [PATCH 0/6] blk-mq-sched: support request batch dispatching for sq elevator
From: Yu Kuai <yukuai3@...wei.com>
Currently, both mq-deadline and bfq have global spin lock that will be
grabbed inside elevator methods like dispatch_request, insert_requests,
and bio_merge. And the global lock is the main reason mq-deadline and
bfq can't scale very well.
For dispatch_request method, current behavior is dispatching one request at
a time. In the case of multiple dispatching contexts, This behavior, on the
one hand, introduce intense lock contention:
t1: t2: t3:
lock lock lock
// grab lock
ops.dispatch_request
unlock
// grab lock
ops.dispatch_request
unlock
// grab lock
ops.dispatch_request
unlock
on the other hand, messing up the requests dispatching order:
t1:
lock
rq1 = ops.dispatch_request
unlock
t2:
lock
rq2 = ops.dispatch_request
unlock
lock
rq3 = ops.dispatch_request
unlock
lock
rq4 = ops.dispatch_request
unlock
//rq1,rq3 issue to disk
// rq2, rq4 issue to disk
In this case, the elevator dispatch order is rq 1-2-3-4, however,
such order in disk is rq 1-3-2-4, the order for rq2 and rq3 is inversed.
While dispatching request, blk_mq_get_disatpch_budget() and
blk_mq_get_driver_tag() must be called, and they are not ready to be
called inside elevator methods, hence introduce a new method like
dispatch_requests is not possible.
In conclusion, this set factor the global lock out of dispatch_request
method, and support request batch dispatch by calling the methods
multiple time while holding the lock.
nullblk setup:
modprobe null_blk nr_devices=0 &&
udevadm settle &&
cd /sys/kernel/config/nullb &&
mkdir nullb0 &&
cd nullb0 &&
echo 0 > completion_nsec &&
echo 512 > blocksize &&
echo 0 > home_node &&
echo 0 > irqmode &&
echo 128 > submit_queues &&
echo 1024 > hw_queue_depth &&
echo 1024 > size &&
echo 0 > memory_backed &&
echo 2 > queue_mode &&
echo 1 > power ||
exit $?
Test script:
fio -filename=/dev/$disk -name=test -rw=randwrite -bs=4k -iodepth=32 \
-numjobs=16 --iodepth_batch_submit=8 --iodepth_batch_complete=8 \
-direct=1 -ioengine=io_uring -group_reporting -time_based -runtime=30
Test result: iops
| | deadline | bfq |
| --------------- | -------- | -------- |
| before this set | 263k | 124k |
| after this set | 475k | 292k |
Yu Kuai (6):
mq-deadline: switch to use high layer elevator lock
block, bfq: don't grab queue_lock from io path
block, bfq: switch to use elevator lock
elevator: factor elevator lock out of dispatch_request method
blk-mq-sched: refactor __blk_mq_do_dispatch_sched()
blk-mq-sched: support request batch dispatching for sq elevator
block/bfq-cgroup.c | 4 +-
block/bfq-iosched.c | 73 ++++++-------
block/bfq-iosched.h | 2 +-
block/blk-ioc.c | 43 +++++++-
block/blk-mq-sched.c | 240 ++++++++++++++++++++++++++++++-------------
block/blk-mq.h | 21 ++++
block/blk.h | 2 +-
block/elevator.c | 1 +
block/elevator.h | 4 +-
block/mq-deadline.c | 58 +++++------
10 files changed, 293 insertions(+), 155 deletions(-)
--
2.39.2
Powered by blists - more mailing lists