linux-kernel - Re: [btrfs] 96bed17ad9: fio.write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ec5733c4-9d02-5057-3040-8dcf566efd83@toxicpanda.com>
Date:   Wed, 4 Nov 2020 11:48:11 -0500
From:   Josef Bacik <josef@...icpanda.com>
To:     kernel test robot <oliver.sang@...el.com>
Cc:     David Sterba <dsterba@...e.com>,
        Nikolay Borisov <nborisov@...e.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
        feng.tang@...el.com, zhengjun.xing@...el.com,
        linux-btrfs@...r.kernel.org
Subject: Re: [btrfs] 96bed17ad9: fio.write_iops -59.7% regression

On 11/4/20 1:16 AM, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed a -59.7% regression of fio.write_iops due to commit:
> 
> 
> commit: 96bed17ad9d425ff6958a2e6f87179453a3d76f2 ("btrfs: simplify the logic in need_preemptive_flushing")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> 
> in testcase: fio-basic
> on test machine: 192 threads Intel(R) Xeon(R) CPU @ 2.20GHz with 192G memory
> with following parameters:
> 
> 	disk: 1SSD
> 	fs: btrfs
> 	runtime: 300s
> 	nr_task: 8
> 	rw: write
> 	bs: 4k
> 	ioengine: sync
> 	test_size: 256g
> 	cpufreq_governor: performance
> 	ucode: 0x4002f01
> 
> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
> test-url: https://github.com/axboe/fio
> 

I generally ignore these reports, but since it's FIO I figured at least the test 
itself was valid.  However once again I'm unable to reproduce the results

linus master:

task_0: (groupid=0, jobs=8): err= 0: pid=38586: Wed Nov  4 08:13:36 2020
   write: IOPS=168k, BW=655MiB/s (687MB/s)(192GiB/300001msec); 0 zone resets
     clat (usec): min=26, max=786, avg=47.15, stdev= 7.21
      lat (usec): min=26, max=786, avg=47.21, stdev= 7.21
     clat percentiles (nsec):
      |  1.00th=[31872],  5.00th=[35584], 10.00th=[37632], 20.00th=[40704],
      | 30.00th=[43264], 40.00th=[45312], 50.00th=[47360], 60.00th=[48896],
      | 70.00th=[50944], 80.00th=[52992], 90.00th=[56064], 95.00th=[59136],
      | 99.00th=[65280], 99.50th=[68096], 99.90th=[74240], 99.95th=[77312],
      | 99.99th=[88576]
    bw (  KiB/s): min=63752, max=112864, per=12.50%, avg=83810.53, 
stdev=3403.48, samples=4792
    iops        : min=15938, max=28216, avg=20952.61, stdev=850.87, samples=4792
   lat (usec)   : 50=65.73%, 100=34.27%, 250=0.01%, 500=0.01%, 750=0.01%
   lat (usec)   : 1000=0.01%
   cpu          : usr=2.22%, sys=97.77%, ctx=5054, majf=0, minf=63
   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      issued rwts: total=0,50298940,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   WRITE: bw=655MiB/s (687MB/s), 655MiB/s-655MiB/s (687MB/s-687MB/s), io=192GiB 
(206GB), run=300001-300001msec

kdave/for-next-20201104
task_0: (groupid=0, jobs=8): err= 0: pid=6652: Wed Nov  4 08:41:52 2020
   write: IOPS=180k, BW=705MiB/s (739MB/s)(207GiB/300001msec); 0 zone resets
     clat (usec): min=17, max=10603, avg=43.91, stdev= 9.62
      lat (usec): min=17, max=10603, avg=43.98, stdev= 9.62
     clat percentiles (nsec):
      |  1.00th=[25984],  5.00th=[31104], 10.00th=[33536], 20.00th=[37120],
      | 30.00th=[39168], 40.00th=[41216], 50.00th=[43264], 60.00th=[45824],
      | 70.00th=[47872], 80.00th=[50944], 90.00th=[54528], 95.00th=[57600],
      | 99.00th=[64768], 99.50th=[68096], 99.90th=[74240], 99.95th=[78336],
      | 99.99th=[90624]
    bw (  KiB/s): min=66760, max=123160, per=12.50%, avg=90221.11, 
stdev=9052.52, samples=4792
    iops        : min=16690, max=30790, avg=22555.24, stdev=2263.14, samples=4792
   lat (usec)   : 20=0.01%, 50=77.24%, 100=22.75%, 250=0.01%, 500=0.01%
   lat (usec)   : 750=0.01%, 1000=0.01%
   lat (msec)   : 2=0.01%, 4=0.01%, 20=0.01%
   cpu          : usr=1.67%, sys=98.31%, ctx=4806, majf=0, minf=68
   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
      issued rwts: total=0,54134917,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   WRITE: bw=705MiB/s (739MB/s), 705MiB/s-705MiB/s (739MB/s-739MB/s), io=207GiB 
(222GB), run=300001-300001msec

So instead of -60% iops regression, I'm seeing a 7% iops improvement.  The only 
difference is that my machine doesn't have 192 threads, it has 80.  Thanks,

Josef