lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 3 Sep 2020 15:28:46 -0400
From:   Douglas Gilbert <dgilbert@...erlog.com>
To:     John Garry <john.garry@...wei.com>, axboe@...nel.dk,
        jejb@...ux.ibm.com, martin.petersen@...cle.com,
        don.brace@...rosemi.com, kashyap.desai@...adcom.com,
        ming.lei@...hat.com, bvanassche@....org, paolo.valente@...aro.org,
        hare@...e.de, hch@....de
Cc:     sumit.saxena@...adcom.com, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org,
        esc.storagedev@...rosemi.com, megaraidlinux.pdl@...adcom.com,
        chenxiang66@...ilicon.com, luojiaxing@...wei.com
Subject: Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for
 SCSI HBAs

On 2020-08-19 11:20 a.m., John Garry wrote:
> Hi all,
> 
> Here is v8 of the patchset.
> 
> In this version of the series, we keep the shared sbitmap for driver tags,
> and introduce changes to fix up the tag budgeting across request queues.
> We also have a change to count requests per-hctx for when an elevator is
> enabled, as an optimisation. I also dropped the debugfs changes - more on
> that below.
> 
> Some performance figures:
> 
> Using 12x SAS SSDs on hisi_sas v3 hw. mq-deadline results are included,
> but it is not always an appropriate scheduler to use.
> 
> Tag depth 		4000 (default)			260**
> 
> Baseline (v5.9-rc1):
> none sched:		2094K IOPS			513K
> mq-deadline sched:	2145K IOPS			1336K
> 
> Final, host_tagset=0 in LLDD *, ***:
> none sched:		2120K IOPS			550K
> mq-deadline sched:	2121K IOPS			1309K
> 
> Final ***:
> none sched:		2132K IOPS			1185			
> mq-deadline sched:	2145K IOPS			2097	
> 
> * this is relevant as this is the performance in supporting but not
>    enabling the feature
> ** depth=260 is relevant as some point where we are regularly waiting for
>     tags to be available. Figures were are a bit unstable here.
> *** Included "[PATCH V4] scsi: core: only re-run queue in
>      scsi_end_request() if device queue is busy"
> 
> A copy of the patches can be found here:
> https://github.com/hisilicon/kernel-dev/tree/private-topic-blk-mq-shared-tags-v8
> 
> The hpsa patch depends on:
> https://lore.kernel.org/linux-scsi/20200430131904.5847-1-hare@suse.de/
> 
> And the smartpqi patch is not to be accepted.
> 
> Comments (and testing) welcome, thanks!

I tested this v8 patchset on MKP's 5.10/scsi-queue branch together with
my rewritten sg driver on my laptop and a Ryzen 5 3600 machine. Since I
don't have same hardware, I use the scsi_debug driver as the target:

    modprobe scsi_debug dev_size_mb=1024 sector_size=512 add_host=7 
per_host_store=1 ndelay=1000 random=1 submit_queues=12

My test is a script which runs these three commands many times with
differing parameters:

sg_mrq_dd iflag=random bs=512 of=/dev/sg8 thr=64 time=2
time to transfer data was 0.312705 secs, 3433.72 MB/sec
2097152+0 records in
2097152+0 records out

sg_mrq_dd bpt=256 thr=64 mrq=36 time=2 if=/dev/sg8 bs=512 of=/dev/sg9
time to transfer data was 0.212090 secs, 5062.67 MB/sec
2097152+0 records in
2097152+0 records out

sg_mrq_dd --verify if=/dev/sg8 of=/dev/sg9 bs=512 bpt=256 thr=64 mrq=36 time=2
Doing verify/cmp rather than copy
time to transfer data was 0.184563 secs, 5817.75 MB/sec
2097152+0 records in
2097152+0 records verified

The above is the output from last section of the my script run on the Ryzen 5.

So the three steps are:
    1) produce random data on /dev/sg8
    2) copy /dev/sg8 to /dev/sg9
    3) verify /dev/sg8 and /dev/sg9 are the same.

The latter step is done with a sequence of READ(/dev/sg8) and
VERIFY(BYTCHK=1 on /dev/sg9). The "mrq" stands for multiple requests (in
one invocation; the bsg driver did that before its write(2) command was
removed.
The SCSI devices on the Ryzen 5 machine are:

# lsscsi -gs
[2:0:0:0]  disk    IBM-207x HUSMM8020ASS20   J4B6  /dev/sda   /dev/sg0   200GB
[2:0:1:0]  disk    SEAGATE  ST200FM0073      0007  /dev/sdb   /dev/sg1   200GB
[2:0:2:0]  enclosu Areca Te ARC-802801.37.69 0137  -          /dev/sg2       -
[3:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdc   /dev/sg3  1.07GB
[4:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdd   /dev/sg4  1.07GB
[5:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sde   /dev/sg5  1.07GB
[6:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdf   /dev/sg6  1.07GB
[7:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdg   /dev/sg7  1.07GB
[8:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdh   /dev/sg8  1.07GB
[9:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdi   /dev/sg9  1.07GB
[N:0:1:1]  disk    WDC WDS250G2B0C-00PXH0__1       /dev/nvme0n1  -      250GB

My script took 17m12 and the highest throughput (on a copy) was 7.5 GB/sec.
Then I reloaded the scsi_debug module, this time with an additional
'host_max_queue=128' parameter. The script run time was 5 seconds shorter
and the maximum throughput was around 7.6 GB/sec. [Average throughput is
around 4 GB/sec.]

For comparison:

# time liburing/examples/io_uring-cp /dev/sdh /dev/sdi
real	0m1.542s
user	0m0.004s
sys	0m1.027s

Umm, that's less then 1 GB/sec. In its defence io_uring-cp is an
extremely simple, single threaded, proof-of-concept copy program,
at least compared to sg_mrq_dd . As used by the sg_mrq_dd the
rewritten sg driver bypasses moving 1 GB to and from the _user_
space while doing the above copy and verify steps.

So:

Tested-by: Douglas Gilbert <dgilbert@...erlog.com>

> Differences to v7:
> - Add null_blk and scsi_debug support
> - Drop debugfs tags patch - it's too difficult to be the same between
> hostwide and non-hostwide, as discussed:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#mb3eb462d8be40273718505989abd12f8228c15fd
> And from commit 6bf0eb550452 ("sbitmap: Consider cleared bits in
> sbitmap_bitmap_show()"), I guess not many used this anyway...
> - Add elevator per-hctx request count for optimisation
> - Break up "blk-mq: rename blk_mq_update_tag_set_depth()" into 2x patches
> - Pass flags for avoid per-hq queue tags init/free for hostwide tags
> - Add Don's reviewed-tag and tested-by tags to appropiate patches
> 	- (@Don, please let me know if issue with how I did this)
> - Add "scsi: core: Show nr_hw_queues in sysfs"
> - Rework megaraid SAS patch to have module param (Kashyap)
> - rebase
> 
> V7 is here for more info:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#t
> 
> Hannes Reinecke (5):
>    blk-mq: Rename blk_mq_update_tag_set_depth()
>    blk-mq: Free tags in blk_mq_init_tags() upon error
>    scsi: Add host and host template flag 'host_tagset'
>    hpsa: enable host_tagset and switch to MQ
>    smartpqi: enable host tagset
> 
> John Garry (10):
>    blk-mq: Pass flags for tag init/free
>    blk-mq: Use pointers for blk_mq_tags bitmap tags
>    blk-mq: Facilitate a shared sbitmap per tagset
>    blk-mq: Relocate hctx_may_queue()
>    blk-mq: Record nr_active_requests per queue for when using shared
>      sbitmap
>    blk-mq: Record active_queues_shared_sbitmap per tag_set for when using
>      shared sbitmap
>    null_blk: Support shared tag bitmap
>    scsi: core: Show nr_hw_queues in sysfs
>    scsi: hisi_sas: Switch v3 hw to MQ
>    scsi: scsi_debug: Support host tagset
> 
> Kashyap Desai (2):
>    blk-mq, elevator: Count requests per hctx to improve performance
>    scsi: megaraid_sas: Added support for shared host tagset for
>      cpuhotplug
> 
> Ming Lei (1):
>    blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
> 
>   block/bfq-iosched.c                         |   9 +-
>   block/blk-core.c                            |   2 +
>   block/blk-mq-debugfs.c                      |  10 +-
>   block/blk-mq-sched.c                        |  13 +-
>   block/blk-mq-tag.c                          | 149 ++++++++++++++------
>   block/blk-mq-tag.h                          |  56 +++-----
>   block/blk-mq.c                              |  81 +++++++----
>   block/blk-mq.h                              |  76 +++++++++-
>   block/kyber-iosched.c                       |   4 +-
>   block/mq-deadline.c                         |   6 +
>   drivers/block/null_blk_main.c               |   6 +
>   drivers/block/rnbd/rnbd-clt.c               |   2 +-
>   drivers/scsi/hisi_sas/hisi_sas.h            |   3 +-
>   drivers/scsi/hisi_sas/hisi_sas_main.c       |  36 ++---
>   drivers/scsi/hisi_sas/hisi_sas_v3_hw.c      |  87 +++++-------
>   drivers/scsi/hosts.c                        |   1 +
>   drivers/scsi/hpsa.c                         |  44 +-----
>   drivers/scsi/hpsa.h                         |   1 -
>   drivers/scsi/megaraid/megaraid_sas_base.c   |  39 +++++
>   drivers/scsi/megaraid/megaraid_sas_fusion.c |  29 ++--
>   drivers/scsi/scsi_debug.c                   |  28 ++--
>   drivers/scsi/scsi_lib.c                     |   2 +
>   drivers/scsi/scsi_sysfs.c                   |  11 ++
>   drivers/scsi/smartpqi/smartpqi_init.c       |  45 ++++--
>   include/linux/blk-mq.h                      |  13 +-
>   include/linux/blkdev.h                      |   3 +
>   include/scsi/scsi_host.h                    |   9 +-
>   27 files changed, 484 insertions(+), 281 deletions(-)
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ