lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <51E3A396-F68B-496D-AE36-B0457A3B0968@michaelmarod.com>
Date:   Wed, 30 Mar 2022 14:32:57 +0000
From:   Michael Marod <michael@...haelmarod.com>
To:     linux-kernel@...r.kernel.org
Subject: NVME performance regression in Linux 5.x due to lack of block level IO queueing

It has been noticed in an AWS Cloud environment that the IO performance on an Ubuntu Bionic 18.04 i3 instance with Linux 5.4.0 is significantly worse than that of an Ubuntu Xenial 16.04 i3 instance with Linux 4.4.0. I've been digging into the issue and it seems like block level queueing is not working as well (if at all?) on the Linux 5 instance. When I've run performance tests using fio, iostat reports 0.00 average queue size for the Linux 5.4.0 instance but goes up to 0.5 or higher on the 4.4.0 one.

In my setup, I have 2 NVMe devices (/dev/nvme0n1 and /dev/nvme1n1) configured in a raid0 on /dev/md127 which is mounted on /opt. I've tested devices directly and through the RAID controller and it does not seem to make a difference. In the output below I am writing through /opt which exercises the file system and RAID controller. Also, fio was compiled from source with latest version on both instances.

# fio -name=randrw -filename=/opt/testfile -direct=1 -thread -rw=randrw -ioengine=psync -bs=4k -size=10G -numjobs=1 -group_reporting=1 -runtime=120

Note: these are different versions of iostat which is why the output is different

# Linux 4.4.0-1137-aws:
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme1n1           0.00     0.00 3248.00 3173.00 12992.00 12692.00     8.00     0.44    0.07    0.10    0.03   0.07  44.00
nvme0n1           0.00     0.00 3229.00 3292.00 12916.00 13168.00     8.00     0.47    0.07    0.10    0.04   0.07  46.80

# Linux 5.4.0-1068-aws:
Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1       2842.57 2789.11  11370.30  11156.44     0.00     0.00   0.00   0.00    0.10    0.04   0.00     4.00     4.00   0.18  99.01
nvme1n1       2849.50 2748.51  11398.02  10994.06     0.00     0.00   0.00   0.00    0.10    0.04   0.00     4.00     4.00   0.18  99.01

For another (possibly simpler) demonstration -- I ran blktrace/blkparse on both Linux 4.4.0 and 5.5.0 instances with the same hardware (i3.4xlarge). Then I used fio to do a simple stress test that sends 5 write requests.

# fio -name=write -filename=/opt/testfile -direct=1 -thread -rw=write, -ioengine=psync -bs=4k -size=4k -numjobs=5 -group_reporting=1 -runtime=5

The blkparse output shows that the Linux 4.4.0 instance had 5 "Writes Queued" while the Linux 5.4.0 one had 0. Additionally, there were 0 "IO unplugs" and 5 "Timer unplugs" on the Linux 4.4.0 instance and 5 "Timer unplugs" and 0 "IO unplugs" on the Linux 5.4.0 one.

// On the Bionic machine with Linux 5.4.0-1068-aws #72~18.04.1-Ubuntu SMP Thu Mar 3 08:49:49 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux:
# blktrace -d /dev/nvme0n1 -d /dev/nvme1n1 -o - | blkparse -i -
259,0   12        1     0.000000000 114067  A  WS 401408 + 8 <- (9,127) 274432
259,0   12        2     0.000003433 114067  G  WS 401408 + 8 [fio]
259,0   12        3     0.000004078 114067  P   N [fio]
259,0   12        4     0.000004548 114067  U   N [fio] 1
259,0   12        5     0.000005654 114067  D  WS 401408 + 8 [fio]
259,0    4        1     0.000045379     0  C  WS 401408 + 8 [0]
259,0   14        1     0.000210927     0  C  WS 401408 + 8 [0]
259,0   14        2     0.000254591 114065  A  WS 401408 + 8 <- (9,127) 274432
259,0   14        3     0.000257373 114065  G  WS 401408 + 8 [fio]
259,0   14        4     0.000257853 114065  P   N [fio]
259,0   14        5     0.000258340 114065  U   N [fio] 1
259,0   14        6     0.000259293 114065  D  WS 401408 + 8 [fio]
259,0    8        2     0.000290583     0  C  WS 401408 + 8 [0]
259,0   13        1     0.000080964 114068  A  WS 401408 + 8 <- (9,127) 274432
259,0   13        2     0.000083702 114068  G  WS 401408 + 8 [fio]
259,0   13        3     0.000084247 114068  P   N [fio]
259,0   13        4     0.000084654 114068  U   N [fio] 1
259,0   13        5     0.000085801 114068  D  WS 401408 + 8 [fio]
259,0    3        1 1266874889.709450585 114066  A  WS 401408 + 8 <- (9,127) 274432
259,0    3        2 1266874889.709455280 114066  G  WS 401408 + 8 [fio]
259,0    3        3 1266874889.709456267 114066  P   N [fio]
259,0    3        4 1266874889.709456843 114066  U   N [fio] 1
259,0    3        5 1266874889.709458888 114066  D  WS 401408 + 8 [fio]
259,0   15        1     0.000164738 114069  A  WS 401408 + 8 <- (9,127) 274432
259,0   15        2     0.000168016 114069  G  WS 401408 + 8 [fio]
259,0   15        3     0.000168571 114069  P   N [fio]
259,0   15        4     0.000169037 114069  U   N [fio] 1
259,0   15        5     0.000169993 114069  D  WS 401408 + 8 [fio]
259,0   10        1     0.000128797     0  C  WS 401408 + 8 [0]
^C259,0    8        1 1266874889.709500996     0  C  WS 401408 + 8 [0]
CPU3 (259,0):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        1,        4KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             3
 IO unplugs:             1        	 Timer unplugs:           0
CPU4 (259,0):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        1,        4KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             3
 IO unplugs:             0        	 Timer unplugs:           0
CPU8 (259,0):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        2,        8KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             3
 IO unplugs:             0        	 Timer unplugs:           0
CPU10 (259,0):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        1,        4KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             3
 IO unplugs:             0        	 Timer unplugs:           0
CPU12 (259,0):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        1,        4KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             3
 IO unplugs:             1        	 Timer unplugs:           0
CPU13 (259,0):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        1,        4KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             3
 IO unplugs:             1        	 Timer unplugs:           0
CPU14 (259,0):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        1,        4KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        1,        4KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             3
 IO unplugs:             1        	 Timer unplugs:           0
CPU15 (259,0):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        1,        4KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             3
 IO unplugs:             1        	 Timer unplugs:           0

Total (259,0):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        5,       20KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        5,       20KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 IO unplugs:             5        	 Timer unplugs:           0

Throughput (R/W): 0KiB/s / 0KiB/s
Events (259,0): 30 entries
Skips: 0 forward (0 -   0.0%)

// On the Xenial machine with 4.4.0-1137-aws #151-Ubuntu SMP Tue Mar 1 09:47:43 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# blktrace -d /dev/nvme0n1 -d /dev/nvme1n1 -o - | blkparse -i -
259,1    1        1     0.000000000 82519  Q  WS 399368 + 8 [fio]
259,1    1        2     0.000000980 82519  G  WS 399368 + 8 [fio]
259,1    1        3     0.000003468 82519 UT   N [fio] 1
259,1    1        4     0.000003867 82519  I  WS 399368 + 8 [fio]
259,1    1        5     0.000011785  1393  D  WS 399368 + 8 [kworker/1:1H]
259,1    2        1 1266874889.709465021 82517  Q  WS 399368 + 8 [fio]
259,1    2        2 1266874889.709467133 82517  G  WS 399368 + 8 [fio]
259,1    2        3 1266874889.709471568 82517 UT   N [fio] 1
259,1    2        4 1266874889.709471568 82517  I  WS 399368 + 8 [fio]
259,1    2        5 1266874889.709476205   980  D  WS 399368 + 8 [kworker/2:1H]
259,1   15        1     0.000051309     0  C  WS 399368 + 8 [0]
259,1    3        1     0.000079179 82520  Q  WS 399368 + 8 [fio]
259,1    3        2     0.000080079 82520  G  WS 399368 + 8 [fio]
259,1    3        3     0.000082140 82520 UT   N [fio] 1
259,1    3        4     0.000082391 82520  I  WS 399368 + 8 [fio]
259,1    3        5     0.000087819   975  D  WS 399368 + 8 [kworker/3:1H]
259,1   12        1     0.000125455     0  C  WS 399368 + 8 [0]
259,1    5        1     0.000152778 82521  Q  WS 399368 + 8 [fio]
259,1    5        2     0.000153811 82521  G  WS 399368 + 8 [fio]
259,1    5        3     0.000155811 82521 UT   N [fio] 1
259,1    5        4     0.000156314 82521  I  WS 399368 + 8 [fio]
259,1    5        5     0.000168577  1552  D  WS 399368 + 8 [kworker/5:1H]
259,1    7        1     0.000206251     0  C  WS 399368 + 8 [0]
259,1    9        1     0.000234725 82518  Q  WS 399368 + 8 [fio]
259,1    9        2     0.000235496 82518  G  WS 399368 + 8 [fio]
259,1    9        3     0.000237147 82518 UT   N [fio] 1
259,1    9        4     0.000237382 82518  I  WS 399368 + 8 [fio]
259,1    9        5     0.000242425  2001  D  WS 399368 + 8 [kworker/9:1H]
259,1   15        2     0.000290161     0  C  WS 399368 + 8 [0]
259,1   13        1 1266874889.709520875     0  C  WS 399368 + 8 [0]
^CCPU1 (259,1):
 Reads Queued:           0,        0KiB	 Writes Queued:           1,        4KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        1,        4KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             2
 IO unplugs:             0        	 Timer unplugs:           1
CPU2 (259,1):
 Reads Queued:           0,        0KiB	 Writes Queued:           1,        4KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        1,        4KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             2
 IO unplugs:             0        	 Timer unplugs:           1
CPU3 (259,1):
 Reads Queued:           0,        0KiB	 Writes Queued:           1,        4KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        1,        4KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             2
 IO unplugs:             0        	 Timer unplugs:           1
CPU5 (259,1):
 Reads Queued:           0,        0KiB	 Writes Queued:           1,        4KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        1,        4KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             2
 IO unplugs:             0        	 Timer unplugs:           1
CPU7 (259,1):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        1,        4KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             2
 IO unplugs:             0        	 Timer unplugs:           0
CPU9 (259,1):
 Reads Queued:           0,        0KiB	 Writes Queued:           1,        4KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        1,        4KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        0,        0KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             2
 IO unplugs:             0        	 Timer unplugs:           1
CPU12 (259,1):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        1,        4KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             2
 IO unplugs:             0        	 Timer unplugs:           0
CPU13 (259,1):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        1,        4KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             2
 IO unplugs:             0        	 Timer unplugs:           0
CPU15 (259,1):
 Reads Queued:           0,        0KiB	 Writes Queued:           0,        0KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        2,        8KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             2
 IO unplugs:             0        	 Timer unplugs:           0

Total (259,1):
 Reads Queued:           0,        0KiB	 Writes Queued:           5,       20KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        5,       20KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        5,       20KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 IO unplugs:             0        	 Timer unplugs:           5

Throughput (R/W): 0KiB/s / 0KiB/s
Events (259,1): 30 entries
Skips: 0 forward (0 -   0.0%)

Please CC me directly as I am not subscribed to LKML.

Thanks,

-- Michael Marod (michael@...haelmarod.com)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ