[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18bdfa01-9e08-60b1-3eb8-cb395d639935@interlog.com>
Date: Thu, 3 Sep 2020 15:28:46 -0400
From: Douglas Gilbert <dgilbert@...erlog.com>
To: John Garry <john.garry@...wei.com>, axboe@...nel.dk,
jejb@...ux.ibm.com, martin.petersen@...cle.com,
don.brace@...rosemi.com, kashyap.desai@...adcom.com,
ming.lei@...hat.com, bvanassche@....org, paolo.valente@...aro.org,
hare@...e.de, hch@....de
Cc: sumit.saxena@...adcom.com, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org,
esc.storagedev@...rosemi.com, megaraidlinux.pdl@...adcom.com,
chenxiang66@...ilicon.com, luojiaxing@...wei.com
Subject: Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for
SCSI HBAs
On 2020-08-19 11:20 a.m., John Garry wrote:
> Hi all,
>
> Here is v8 of the patchset.
>
> In this version of the series, we keep the shared sbitmap for driver tags,
> and introduce changes to fix up the tag budgeting across request queues.
> We also have a change to count requests per-hctx for when an elevator is
> enabled, as an optimisation. I also dropped the debugfs changes - more on
> that below.
>
> Some performance figures:
>
> Using 12x SAS SSDs on hisi_sas v3 hw. mq-deadline results are included,
> but it is not always an appropriate scheduler to use.
>
> Tag depth 4000 (default) 260**
>
> Baseline (v5.9-rc1):
> none sched: 2094K IOPS 513K
> mq-deadline sched: 2145K IOPS 1336K
>
> Final, host_tagset=0 in LLDD *, ***:
> none sched: 2120K IOPS 550K
> mq-deadline sched: 2121K IOPS 1309K
>
> Final ***:
> none sched: 2132K IOPS 1185
> mq-deadline sched: 2145K IOPS 2097
>
> * this is relevant as this is the performance in supporting but not
> enabling the feature
> ** depth=260 is relevant as some point where we are regularly waiting for
> tags to be available. Figures were are a bit unstable here.
> *** Included "[PATCH V4] scsi: core: only re-run queue in
> scsi_end_request() if device queue is busy"
>
> A copy of the patches can be found here:
> https://github.com/hisilicon/kernel-dev/tree/private-topic-blk-mq-shared-tags-v8
>
> The hpsa patch depends on:
> https://lore.kernel.org/linux-scsi/20200430131904.5847-1-hare@suse.de/
>
> And the smartpqi patch is not to be accepted.
>
> Comments (and testing) welcome, thanks!
I tested this v8 patchset on MKP's 5.10/scsi-queue branch together with
my rewritten sg driver on my laptop and a Ryzen 5 3600 machine. Since I
don't have same hardware, I use the scsi_debug driver as the target:
modprobe scsi_debug dev_size_mb=1024 sector_size=512 add_host=7
per_host_store=1 ndelay=1000 random=1 submit_queues=12
My test is a script which runs these three commands many times with
differing parameters:
sg_mrq_dd iflag=random bs=512 of=/dev/sg8 thr=64 time=2
time to transfer data was 0.312705 secs, 3433.72 MB/sec
2097152+0 records in
2097152+0 records out
sg_mrq_dd bpt=256 thr=64 mrq=36 time=2 if=/dev/sg8 bs=512 of=/dev/sg9
time to transfer data was 0.212090 secs, 5062.67 MB/sec
2097152+0 records in
2097152+0 records out
sg_mrq_dd --verify if=/dev/sg8 of=/dev/sg9 bs=512 bpt=256 thr=64 mrq=36 time=2
Doing verify/cmp rather than copy
time to transfer data was 0.184563 secs, 5817.75 MB/sec
2097152+0 records in
2097152+0 records verified
The above is the output from last section of the my script run on the Ryzen 5.
So the three steps are:
1) produce random data on /dev/sg8
2) copy /dev/sg8 to /dev/sg9
3) verify /dev/sg8 and /dev/sg9 are the same.
The latter step is done with a sequence of READ(/dev/sg8) and
VERIFY(BYTCHK=1 on /dev/sg9). The "mrq" stands for multiple requests (in
one invocation; the bsg driver did that before its write(2) command was
removed.
The SCSI devices on the Ryzen 5 machine are:
# lsscsi -gs
[2:0:0:0] disk IBM-207x HUSMM8020ASS20 J4B6 /dev/sda /dev/sg0 200GB
[2:0:1:0] disk SEAGATE ST200FM0073 0007 /dev/sdb /dev/sg1 200GB
[2:0:2:0] enclosu Areca Te ARC-802801.37.69 0137 - /dev/sg2 -
[3:0:0:0] disk Linux scsi_debug 0190 /dev/sdc /dev/sg3 1.07GB
[4:0:0:0] disk Linux scsi_debug 0190 /dev/sdd /dev/sg4 1.07GB
[5:0:0:0] disk Linux scsi_debug 0190 /dev/sde /dev/sg5 1.07GB
[6:0:0:0] disk Linux scsi_debug 0190 /dev/sdf /dev/sg6 1.07GB
[7:0:0:0] disk Linux scsi_debug 0190 /dev/sdg /dev/sg7 1.07GB
[8:0:0:0] disk Linux scsi_debug 0190 /dev/sdh /dev/sg8 1.07GB
[9:0:0:0] disk Linux scsi_debug 0190 /dev/sdi /dev/sg9 1.07GB
[N:0:1:1] disk WDC WDS250G2B0C-00PXH0__1 /dev/nvme0n1 - 250GB
My script took 17m12 and the highest throughput (on a copy) was 7.5 GB/sec.
Then I reloaded the scsi_debug module, this time with an additional
'host_max_queue=128' parameter. The script run time was 5 seconds shorter
and the maximum throughput was around 7.6 GB/sec. [Average throughput is
around 4 GB/sec.]
For comparison:
# time liburing/examples/io_uring-cp /dev/sdh /dev/sdi
real 0m1.542s
user 0m0.004s
sys 0m1.027s
Umm, that's less then 1 GB/sec. In its defence io_uring-cp is an
extremely simple, single threaded, proof-of-concept copy program,
at least compared to sg_mrq_dd . As used by the sg_mrq_dd the
rewritten sg driver bypasses moving 1 GB to and from the _user_
space while doing the above copy and verify steps.
So:
Tested-by: Douglas Gilbert <dgilbert@...erlog.com>
> Differences to v7:
> - Add null_blk and scsi_debug support
> - Drop debugfs tags patch - it's too difficult to be the same between
> hostwide and non-hostwide, as discussed:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#mb3eb462d8be40273718505989abd12f8228c15fd
> And from commit 6bf0eb550452 ("sbitmap: Consider cleared bits in
> sbitmap_bitmap_show()"), I guess not many used this anyway...
> - Add elevator per-hctx request count for optimisation
> - Break up "blk-mq: rename blk_mq_update_tag_set_depth()" into 2x patches
> - Pass flags for avoid per-hq queue tags init/free for hostwide tags
> - Add Don's reviewed-tag and tested-by tags to appropiate patches
> - (@Don, please let me know if issue with how I did this)
> - Add "scsi: core: Show nr_hw_queues in sysfs"
> - Rework megaraid SAS patch to have module param (Kashyap)
> - rebase
>
> V7 is here for more info:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#t
>
> Hannes Reinecke (5):
> blk-mq: Rename blk_mq_update_tag_set_depth()
> blk-mq: Free tags in blk_mq_init_tags() upon error
> scsi: Add host and host template flag 'host_tagset'
> hpsa: enable host_tagset and switch to MQ
> smartpqi: enable host tagset
>
> John Garry (10):
> blk-mq: Pass flags for tag init/free
> blk-mq: Use pointers for blk_mq_tags bitmap tags
> blk-mq: Facilitate a shared sbitmap per tagset
> blk-mq: Relocate hctx_may_queue()
> blk-mq: Record nr_active_requests per queue for when using shared
> sbitmap
> blk-mq: Record active_queues_shared_sbitmap per tag_set for when using
> shared sbitmap
> null_blk: Support shared tag bitmap
> scsi: core: Show nr_hw_queues in sysfs
> scsi: hisi_sas: Switch v3 hw to MQ
> scsi: scsi_debug: Support host tagset
>
> Kashyap Desai (2):
> blk-mq, elevator: Count requests per hctx to improve performance
> scsi: megaraid_sas: Added support for shared host tagset for
> cpuhotplug
>
> Ming Lei (1):
> blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
>
> block/bfq-iosched.c | 9 +-
> block/blk-core.c | 2 +
> block/blk-mq-debugfs.c | 10 +-
> block/blk-mq-sched.c | 13 +-
> block/blk-mq-tag.c | 149 ++++++++++++++------
> block/blk-mq-tag.h | 56 +++-----
> block/blk-mq.c | 81 +++++++----
> block/blk-mq.h | 76 +++++++++-
> block/kyber-iosched.c | 4 +-
> block/mq-deadline.c | 6 +
> drivers/block/null_blk_main.c | 6 +
> drivers/block/rnbd/rnbd-clt.c | 2 +-
> drivers/scsi/hisi_sas/hisi_sas.h | 3 +-
> drivers/scsi/hisi_sas/hisi_sas_main.c | 36 ++---
> drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 87 +++++-------
> drivers/scsi/hosts.c | 1 +
> drivers/scsi/hpsa.c | 44 +-----
> drivers/scsi/hpsa.h | 1 -
> drivers/scsi/megaraid/megaraid_sas_base.c | 39 +++++
> drivers/scsi/megaraid/megaraid_sas_fusion.c | 29 ++--
> drivers/scsi/scsi_debug.c | 28 ++--
> drivers/scsi/scsi_lib.c | 2 +
> drivers/scsi/scsi_sysfs.c | 11 ++
> drivers/scsi/smartpqi/smartpqi_init.c | 45 ++++--
> include/linux/blk-mq.h | 13 +-
> include/linux/blkdev.h | 3 +
> include/scsi/scsi_host.h | 9 +-
> 27 files changed, 484 insertions(+), 281 deletions(-)
>
Powered by blists - more mailing lists