linux-kernel - Re: [RFC v1 0/1] nvme testsuite runtime optimization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <27235520-2e63-2891-fd0a-ff758f18032e@nvidia.com>
Date:   Wed, 19 Apr 2023 21:11:33 +0000
From:   Chaitanya Kulkarni <chaitanyak@...dia.com>
To:     Sagi Grimberg <sagi@...mberg.me>, Daniel Wagner <dwagner@...e.de>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        Shin'ichiro Kawasaki <shinichiro@...tmail.com>
Subject: Re: [RFC v1 0/1] nvme testsuite runtime optimization

On 4/19/23 02:50, Sagi Grimberg wrote:
>
>>> While testing the fc transport I got a bit tired of wait for the I/O 
>>> jobs to
>>> finish. Thus here some runtime optimization.
>>>
>>> With a small/slow VM I got following values:
>>>
>>> with 'optimizations'
>>>     loop:
>>>       real    4m43.981s
>>>       user    0m17.754s
>>>       sys     2m6.249s
>
> How come loop is doubling the time with this patch?
> ratio is not the same before and after.
>
>>>
>>>     rdma:
>>>       real    2m35.160s
>>>       user    0m6.264s
>>>       sys     0m56.230s
>>>
>>>     tcp:
>>>       real    2m30.391s
>>>       user    0m5.770s
>>>       sys     0m46.007s
>>>
>>>     fc:
>>>       real    2m19.738s
>>>       user    0m6.012s
>>>       sys     0m42.201s
>>>
>>> base:
>>>     loop:
>>>       real    7m35.061s
>>>       user    0m23.493s
>>>       sys     2m54.866s
>>>
>>>     rdma:
>>>       real    8m29.347s
>>>       user    0m13.078s
>>>       sys     1m53.158s
>>>
>>>     tcp:
>>>       real    8m11.357s
>>>       user    0m13.033s
>>>       sys     2m43.156s
>>>
>>>     fc:
>>>       real    5m46.615s
>>>       user    0m12.819s
>>>       sys     1m46.338s
>>>
>>>
>>
>> Those jobs are meant to be run for at least 1G to establish
>> confidence on the data set and the system under test since SSDs
>> are in TBs nowadays and we don't even get anywhere close to that,
>> with your suggestion we are going even lower ...
>
> Where does the 1G boundary coming from?
>


I wrote these testcases 3 times, initially they were the part of
nvme-cli tests7-8 years ago, then nvmftests 7-6 years ago, then they
moved to blktests.

In that time some of the testcases would not fail on with small size
such as less than 512MB especially with verification but they were
in the errors with 1G Hence I kept to be 1G.

Now I don't remember why I didn't use bigger size than 1G
should have documented that somewhere ...

>> we cannot change the dataset size for slow VMs, instead add
>> a command line argument and pass it to tests e.g.
>> nvme_verification_size=XXX similar to nvme_trtype but don't change
>> the default values which we have been testing for years now
>>
>> Testing is supposed to be time consuming especially verification jobs..
>
> I like the idea, but I think it may need to be the other way around.
> Have shortest possible runs by default.

see above..

-ck