lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 22 Sep 2020 08:51:05 +0000
From:   Abdul Anshad Azeez <aazees@...are.com>
To:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
CC:     "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>
Subject: Performance regressions in networking & storage benchmarks in Linux
 kernel 5.8

Part of VMware's performance regression testing for Linux Kernel upstream rele
ases we compared Linux kernel 5.8 against 5.7. Our evaluation revealed perform
ance regressions mostly in networking latency/response-time benchmarks up to 6
0%. Storage throughput & latency benchmarks were also up by 8%.

After performing the bisect between kernel 5.8 and 5.7, we identified the root
 cause behaviour to be an interrupt related change from Thomas Gleixner's "633
260fa143bbed05e65dc557a492667dfdc45bb(x86/irq: Convey vector as argument and n
ot in ptregs)" commit. To confirm this, we backed out the commit from 5.8 & re
ran our tests and found that the performance was similar to 5.7 kernel.

Impacted test cases:

Networking:
    - Netperf TCP_RR & TCP_CRR - Response time
    - Ping - Response time
    - Memcache - Response time
    - Netperf TCP_STREAM small(8K socket & 256B message)(TCP_NODELAY set) pack
ets - Throughput & CPU utilization(CPU/Gbits)

Storage:
    - FIO:
        - 4K (rand|seq)_(read|write) local-NVMe MultiVM tests - Throughput & l
atency

>From our testing, overall results indicate that above-mentioned commit has int
roduced performance regressions in latency-sensitive workloads for networking.
 For storage, it affected both throughput & latency workloads.

Also, since Linux 5.9-rc4 kernel was released recently, we repeated the same e
xperiments on 5.9-rc4. We observed all regressions were fixed and the performa
nce numbers between 5.7 and 5.9-rc4 were similar.

In order to find the fix commit, we bisected again between 5.8 and 5.9-rc4 and
 identified that regressions were fixed from a commit made by the same author 
Thomas Gleixner, which unbreaks the interrupt affinity settings - "e027fffff79
9cdd70400c5485b1a54f482255985(x86/irq: Unbreak interrupt affinity setting)".

We believe these findings would be useful to the Linux community and wanted to
 document the same.

Abdul Anshad Azeez
Performance Engineering
VMware, Inc.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ