[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220218184157.176457-6-melanieplageman@gmail.com>
Date: Fri, 18 Feb 2022 18:41:57 +0000
From: "Melanie Plageman (Microsoft)" <melanieplageman@...il.com>
To: mikelley@...rosoft.com, jejb@...ux.ibm.com, kys@...rosoft.com,
martin.petersen@...cle.com, mst@...hat.com,
benh@...nel.crashing.org, decui@...rosoft.com,
don.brace@...rochip.com, R-QLogic-Storage-Upstream@...vell.com,
haiyangz@...rosoft.com, jasowang@...hat.com, john.garry@...wei.com,
kashyap.desai@...adcom.com, mpe@...erman.id.au,
njavali@...vell.com, pbonzini@...hat.com, paulus@...ba.org,
sathya.prakash@...adcom.com,
shivasharan.srikanteshwara@...adcom.com,
sreekanth.reddy@...adcom.com, stefanha@...hat.com,
sthemmin@...rosoft.com, suganath-prabu.subramani@...adcom.com,
sumit.saxena@...adcom.com, tyreld@...ux.ibm.com,
wei.liu@...nel.org, linuxppc-dev@...ts.ozlabs.org,
megaraidlinux.pdl@...adcom.com, mpi3mr-linuxdrv.pdl@...adcom.com,
storagedev@...rochip.com,
virtualization@...ts.linux-foundation.org,
linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-scsi@...r.kernel.org, MPT-FusionLinux.pdl@...adcom.com
Cc: andres@...razel.de
Subject: [PATCH RFC v1 5/5] scsi: storvsc: Hardware queues share blk_mq_tags
Decouple the number of tags available from the number of hardware queues
by sharing a single blk_mq_tags amongst all hardware queues.
When storage latency is relatively high, having too many tags available
can harm the performance of mixed workloads.
By sharing blk_mq_tags amongst hardware queues, nr_requests can be set
to the appropriate number of tags for the device.
Signed-off-by: Melanie Plageman <melanieplageman@...il.com>
---
As an example, on a 16-core VM coupled with a 1 TiB storage device having a
combined (VM + disk) max BW of 200 MB/s and IOPS of 5000, configured with 16
hardware queues and with nr_requests set to 56 and queue_depth set to 15, the
following fio job description illustrates the benefit of hardware queues sharing
blk_mq_tags:
[global]
time_based=1
ioengine=io_uring
direct=1
runtime=60
[read_hogs]
bs=16k
iodepth=10000
rw=randread
filesize=10G
numjobs=15
directory=/mnt/test
[wal]
bs=8k
iodepth=3
filesize=4G
rw=write
numjobs=1
directory=/mnt/test
with hctx_share_tags set, the "wal" job does 271 IOPS, averaging 13120 usec
completion latency and the "read_hogs" jobs average around 4700 IOPS.
without hctx_share_tags set, the "wal" job does 85 IOPS and averages around
45308 usec completion latency and the "read_hogs" job average around 4900 IOPS.
Note that reducing nr_requests to a number sufficient to increase WAL IOPS
results in unacceptably low IOPS for the random reads when only one random read
job is running.
drivers/scsi/storvsc_drv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 0ed764bcabab..5048e7fcf959 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1997,6 +1997,7 @@ static struct scsi_host_template scsi_driver = {
.track_queue_depth = 1,
.change_queue_depth = storvsc_change_queue_depth,
.per_device_tag_set = 1,
+ .hctx_share_tags = 1,
};
enum {
--
2.25.1
Powered by blists - more mailing lists