lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4034AD9F-2A6A-4AE6-B5FC-58FC2BC238F5@michaelmarod.com>
Date:   Thu, 31 Mar 2022 23:22:03 +0000
From:   Michael Marod <michael@...haelmarod.com>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-block@...r.kernel.org
Subject: Re: NVME performance regression in Linux 5.x due to lack of block level IO queueing

Good call -- Turns out that that cache issue is resolved in 5.17. I tried a number of kernels and narrowed it down to a problem that started after 4.9 and before 4.15, and ended some time after 5.13. Namely, 4.9 is good, 4.15 is bad, 5.13 is bad, and 5.17 is good. I did not bisect it all the way down to the specific versions where the behaviors changed.

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme1n1       2758.00 2783.00  11032.00  11132.00     0.00     0.00   0.00   0.00    0.10    0.03   0.36     4.00     4.00   0.18 100.00
nvme0n1       2830.00 2875.00  11320.00  11500.00     0.00     0.00   0.00   0.00    0.10    0.03   0.39     4.00     4.00   0.18 100.00

With regards to the performance between 4.4.0 and 5.17, for a single thread, 4.4.0 still had better performance over 5.17. However, the 5.17 kernel was significantly better at multiple threads. In fact, it is so much better I don't believe the results (10x improvement!). Is this to be expected that a single thread would be slower in 5.17, but recent improvements make it possible to run many of them in parallel more efficiently?

# /usr/local/bin/fio -name=randrw -filename=/opt/foo -direct=1 -iodepth=1 -thread -rw=randrw -ioengine=psync -bs=4k -size=10G -numjobs=16 -group_reporting=1 -runtime=120

// Ubuntu 16.04 / Linux 4.4.0:
Run status group 0 (all jobs):
   READ: bw=54.5MiB/s (57.1MB/s), 54.5MiB/s-54.5MiB/s (57.1MB/s-57.1MB/s), io=6537MiB (6854MB), run=120002-120002msec
  WRITE: bw=54.5MiB/s (57.2MB/s), 54.5MiB/s-54.5MiB/s (57.2MB/s-57.2MB/s), io=6544MiB (6862MB), run=120002-120002msec

// Ubuntu 18.04 / Linux 5.4.0:
Run status group 0 (all jobs):
   READ: bw=23.5MiB/s (24.7MB/s), 23.5MiB/s-23.5MiB/s (24.7MB/s-24.7MB/s), io=2821MiB (2959MB), run=120002-120002msec
  WRITE: bw=23.5MiB/s (24.6MB/s), 23.5MiB/s-23.5MiB/s (24.6MB/s-24.6MB/s), io=2819MiB (2955MB), run=120002-120002msec

// Ubuntu 18.04 / Linux 5.17:
Run status group 0 (all jobs):
   READ: bw=244MiB/s (255MB/s), 244MiB/s-244MiB/s (255MB/s-255MB/s), io=28.6GiB (30.7GB), run=120001-120001msec
  WRITE: bw=244MiB/s (256MB/s), 244MiB/s-244MiB/s (256MB/s-256MB/s), io=28.6GiB (30.7GB), run=120001-120001msec

Thanks,
Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ