lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <C7D5F99D-B8DB-462B-B665-AE268CDE90D2@vmware.com>
Date:   Tue, 25 Feb 2020 05:46:41 +0000
From:   Rajender M <manir@...are.com>
To:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC:     Vincent Guittot <vincent.guittot@...aro.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "David S. Miller" <davem@...emloft.net>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Performance impact in networking data path tests in Linux 5.5 Kernel

As part of VMware's performance regression testing for Linux Kernel upstream
 releases, when comparing Linux 5.5 kernel against Linux 5.4 kernel, we noticed 
20% improvement in networking throughput performance at the cost of a 30% 
increase in the CPU utilization.

After performing the bisect between 5.4 and 5.5, we identified the root cause 
of this behaviour to be a scheduling change from Vincent Guittot's 
2ab4092fc82d ("sched/fair: Spread out tasks evenly when not overloaded").

The impacted testcases are TCP_STREAM SEND & RECV – on both small 
(8K socket & 256B message) & large (64K socket & 16K message) packet sizes.

We backed out Vincent's commit & reran our networking tests and found that 
the performance were similar to 5.4 kernel - improvements in networking tests 
were no more.

In our current network performance testing, we use Intel 10G NIC to evaluate 
all Linux Kernel releases. In order to confirm that the impact is also seen in 
higher bandwidth NIC, we repeated the same test cases with Intel 40G and 
we were able to reproduce the same behaviour - 25% improvements in 
throughput with 10% more CPU consumption.

The overall results indicate that the new scheduler change has introduced 
much better network throughput performance at the cost of incremental 
CPU usage. This can be seen as expected behavior because now the 
TCP streams are evenly spread across all the CPUs and eventually drives 
more network packets, with additional CPU consumption.


We have also confirmed this theory by parsing the ESX stats for 5.4 and 5.5 
kernels in a 4vCPU VM running 8 TCP streams - as shown below;

5.4 kernel:
  "2132149": {"id": 2132149, "used": 94.37, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
  "2132151": {"id": 2132151, "used": 0.13, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
  "2132152": {"id": 2132152, "used": 9.07, "ready": 0.03, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
  "2132153": {"id": 2132153, "used": 34.77, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",

5.5 kernel:
  "2132041": {"id": 2132041, "used": 55.70, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
  "2132043": {"id": 2132043, "used": 47.53, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
  "2132044": {"id": 2132044, "used": 77.81, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
  "2132045": {"id": 2132045, "used": 57.11, "ready": 0.02, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",

Note, "used %" in above stats for 5.5 kernel is evenly distributed across all vCPUs. 

On the whole, this change should be seen as a significant improvement for 
most customers.

Rajender M
Performance Engineering
VMware, Inc.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ