linux-kernel - Re: [RFC v1 0/1] nvme testsuite runtime optimization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <zlavgcdalmmtabiabu76m4s3oo5hyaehckmwcxvqrnu3j6q6xo@5ke6gv5h3j7i>
Date:   Wed, 19 Apr 2023 13:10:52 +0200
From:   Daniel Wagner <dwagner@...e.de>
To:     Sagi Grimberg <sagi@...mberg.me>
Cc:     Chaitanya Kulkarni <chaitanyak@...dia.com>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        Shin'ichiro Kawasaki <shinichiro@...tmail.com>
Subject: Re: [RFC v1 0/1] nvme testsuite runtime optimization

On Wed, Apr 19, 2023 at 12:50:10PM +0300, Sagi Grimberg wrote:
> 
> > > While testing the fc transport I got a bit tired of wait for the I/O jobs to
> > > finish. Thus here some runtime optimization.
> > > 
> > > With a small/slow VM I got following values:
> > > 
> > > with 'optimizations'
> > >     loop:
> > >       real    4m43.981s
> > >       user    0m17.754s
> > >       sys     2m6.249s
> 
> How come loop is doubling the time with this patch?
> ratio is not the same before and after.

first run was with loop, second one with rdma:

nvme/002 (create many subsystems and test discovery)         [not run]
    runtime  82.089s  ...
    nvme_trtype=rdma is not supported in this test

nvme/016 (create/delete many NVMeOF block device-backed ns and test discovery) [not run]
    runtime  39.948s  ...
    nvme_trtype=rdma is not supported in this test
nvme/017 (create/delete many file-ns and test discovery)     [not run]
    runtime  40.237s  ...

nvme/047 (test different queue types for fabric transports)  [passed]
    runtime    ...  13.580s
nvme/048 (Test queue count changes on reconnect)             [passed]
    runtime    ...  6.287s

82 + 40 + 40 - 14 - 6 = 142. So loop runs additional tests. Hmm, though my
optimization didn't work there...

> > Those jobs are meant to be run for at least 1G to establish
> > confidence on the data set and the system under test since SSDs
> > are in TBs nowadays and we don't even get anywhere close to that,
> > with your suggestion we are going even lower ...
> 
> Where does the 1G boundary coming from?

No idea, it just the existing hard coded values. I guess it might be from
efa06fcf3c83 ("loop: test partition scanning") which was the first real test
case (according the logs).

> > we cannot change the dataset size for slow VMs, instead add
> > a command line argument and pass it to tests e.g.
> > nvme_verification_size=XXX similar to nvme_trtype but don't change
> > the default values which we have been testing for years now
> > 
> > Testing is supposed to be time consuming especially verification jobs..
> 
> I like the idea, but I think it may need to be the other way around.
> Have shortest possible runs by default.

Good point, I'll make it configurable. What is a good small default then? There
are some test cases in loop which allocated a 1M file. That's propably too
small.