[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250806085720.4040507-1-yukuai1@huaweicloud.com>
Date: Wed, 6 Aug 2025 16:57:15 +0800
From: Yu Kuai <yukuai1@...weicloud.com>
To: dlemoal@...nel.org,
hare@...e.de,
jack@...e.cz,
bvanassche@....org,
tj@...nel.org,
josef@...icpanda.com,
axboe@...nel.dk,
yukuai3@...wei.com
Cc: cgroups@...r.kernel.org,
linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org,
yukuai1@...weicloud.com,
yi.zhang@...wei.com,
yangerkun@...wei.com,
johnny.chenyi@...wei.com
Subject: [PATCH v3 0/5] blk-mq-sched: support request batch dispatching for sq elevator
From: Yu Kuai <yukuai3@...wei.com>
Changes from v2:
- add elevator lock/unlock macros in patch 1;
- improve coding style and commit messages;
- retest with a new environment
- add test for scsi HDD and nvme;
Changes from v1:
- the ioc changes are send separately;
- change the patch 1-3 order as suggested by Damien;
Currently, both mq-deadline and bfq have global spin lock that will be
grabbed inside elevator methods like dispatch_request, insert_requests,
and bio_merge. And the global lock is the main reason mq-deadline and
bfq can't scale very well.
For dispatch_request method, current behavior is dispatching one request at
a time. In the case of multiple dispatching contexts, This behavior, on the
one hand, introduce intense lock contention:
t1: t2: t3:
lock lock lock
// grab lock
ops.dispatch_request
unlock
// grab lock
ops.dispatch_request
unlock
// grab lock
ops.dispatch_request
unlock
on the other hand, messing up the requests dispatching order:
t1:
lock
rq1 = ops.dispatch_request
unlock
t2:
lock
rq2 = ops.dispatch_request
unlock
lock
rq3 = ops.dispatch_request
unlock
lock
rq4 = ops.dispatch_request
unlock
//rq1,rq3 issue to disk
// rq2, rq4 issue to disk
In this case, the elevator dispatch order is rq 1-2-3-4, however,
such order in disk is rq 1-3-2-4, the order for rq2 and rq3 is inversed.
While dispatching request, blk_mq_get_disatpch_budget() and
blk_mq_get_driver_tag() must be called, and they are not ready to be
called inside elevator methods, hence introduce a new method like
dispatch_requests is not possible.
In conclusion, this set factor the global lock out of dispatch_request
method, and support request batch dispatch by calling the methods
multiple time while holding the lock.
Test Environment:
arm64 Kunpeng-920, with 4 nodes 128 cores
nvme: HWE52P431T6M005N
scsi HDD: MG04ACA600E attached to hisi_sas_v3
null_blk set up:
modprobe null_blk nr_devices=0 &&
udevadm settle &&
cd /sys/kernel/config/nullb &&
mkdir nullb0 &&
cd nullb0 &&
echo 0 > completion_nsec &&
echo 512 > blocksize &&
echo 0 > home_node &&
echo 0 > irqmode &&
echo 128 > submit_queues &&
echo 1024 > hw_queue_depth &&
echo 1024 > size &&
echo 0 > memory_backed &&
echo 2 > queue_mode &&
echo 1 > power ||
exit $?
null_blk and nvme test script:
[global]
filename=/dev/{nullb0,nvme0n1}
rw=randwrite
bs=4k
iodepth=32
iodepth_batch_submit=8
iodepth_batch_complete=8
direct=1
ioengine=io_uring
time_based
[write]
numjobs=16
runtime=60
scsi HDD test script: noted this test aims to test if batch dispatch
will affect IO merge.
[global]
filename=/dev/sda
rw=write
bs=4k
iodepth=32
iodepth_batch_submit=1
direct=1
ioengine=libaio
[write]
offset_increment=1g
numjobs=128
Test Result:
1) nullblk: iops test with high IO pressue
| | deadline | bfq |
| --------------- | -------- | -------- |
| before this set | 256k | 153k |
| after this set | 594k | 283k |
2) nvme: iops test with high IO pressue
| | deadline | bfq |
| --------------- | -------- | -------- |
| before this set | 258k | 142k |
| after this set | 568k | 214k |
3) scsi HDD: io merge test, elevator is deadline
| | w/s | %wrqm | wareq-sz | aqu-sz |
| --------------- | ----- | ----- | -------- | ------ |
| before this set | 92.25 | 96.88 | 128 | 129 |
| after this set | 92.63 | 96.88 | 128 | 129 |
Yu Kuai (5):
blk-mq-sched: introduce high level elevator lock
mq-deadline: switch to use elevator lock
block, bfq: switch to use elevator lock
blk-mq-sched: refactor __blk_mq_do_dispatch_sched()
blk-mq-sched: support request batch dispatching for sq elevator
block/bfq-cgroup.c | 6 +-
block/bfq-iosched.c | 53 +++++-----
block/bfq-iosched.h | 2 -
block/blk-mq-sched.c | 246 ++++++++++++++++++++++++++++++-------------
block/blk-mq.h | 21 ++++
block/elevator.c | 1 +
block/elevator.h | 14 ++-
block/mq-deadline.c | 60 +++++------
8 files changed, 263 insertions(+), 140 deletions(-)
--
2.39.2
Powered by blists - more mailing lists