linux-kernel - Re: [PATCH] blk-mq: allow hardware queue to get more tag while sharing a tag set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b124b91b-7474-fa27-b78c-01b7e7396a17@huawei.com>
Date:   Tue, 3 Aug 2021 10:57:05 +0800
From:   "yukuai (C)" <yukuai3@...wei.com>
To:     Bart Van Assche <bvanassche@....org>, <axboe@...nel.dk>,
        <ming.lei@...hat.com>
CC:     <linux-block@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <yi.zhang@...wei.com>
Subject: Re: [PATCH] blk-mq: allow hardware queue to get more tag while
 sharing a tag set

On 2021/08/03 0:17, Bart Van Assche wrote:
> On 8/2/21 6:34 AM, yukuai (C) wrote:
>> I run a test on both null_blk and nvme, results show that there are no
>> performance degradation:
>>
>> test platform: x86
>> test cpu: 2 nodes, total 72
>> test scheduler: none
>> test device: null_blk / nvme
>>
>> test cmd: fio -filename=/dev/xxx -name=test -ioengine=libaio -direct=1
>> -numjobs=72 -iodepth=16 -bs=4k -rw=write -offset_increment=1G
>> -cpus_allowed=0:71 -cpus_allowed_policy=split -group_reporting
>> -runtime=120
>>
>> test results: iops
>> 1) null_blk before this patch: 280k
>> 2) null_blk after this patch: 282k
>> 3) nvme before this patch: 378k
>> 4) nvme after this patch: 384k
> 
> Please use io_uring for performance tests.
> 
> The null_blk numbers seem way too low to me. If I run a null_blk 
> performance test inside a VM with 6 CPU cores (Xeon W-2135 CPU) I see 
> about 6 million IOPS for synchronous I/O and about 4.4 million IOPS when 
> using libaio. The options I used and that are not in the above command 
> line are: --thread --gtod_reduce=1 --ioscheduler=none.
> 

Hi, Bart

The cpu I'm testing is Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, and
after switching to io_uring with "--thread --gtod_reduce=1
--ioscheduler=none", the numbers can increase to 330k, yet still
far behind 6000k.

The new atomic operations in the hot path is atomic_read() from
hctx_may_queue(), and the atomic variable will change in two
situations:

a. fail to get driver tag with dbusy not set, increase and set dbusy.
b. if dbusy is set when queue switch from busy to dile, decrease and
clear dbusy.

During the period a device "idle -> busy -> idle", the new atomic
variable can be writen twice at most, which means this is almost
readonly in the above test situation. So I guess the impact on
performance is minimal ?

Thanks!
Kuai