lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR05MB8448FF2F53343CAA8AF2DD09AF3E9@PH0PR05MB8448.namprd05.prod.outlook.com>
Date:   Fri, 25 Feb 2022 12:48:09 +0000
From:   Manikandan Jagatheesan <mjagatheesan@...are.com>
To:     Mel Gorman <mgorman@...hsingularity.net>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     "ziy@...dia.com" <ziy@...dia.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "mhocko@...nel.org" <mhocko@...nel.org>,
        "brouer@...hat.com" <brouer@...hat.com>,
        Rajender M <manir@...are.com>,
        Abdul Anshad Azeez <aazees@...are.com>,
        Yiu Cho Lau <lauyiuch@...are.com>
Subject: [Linux Kernel 5.14 GA] ESXi Performance degradation

As part of VMware's performance regression testing for Linux
Kernel upstream releases, we have evaluated the performance 
of Linux kernel 5.14 against the 5.13 release and would like 
to share the below observation. We have noticed performance 
degradation in ESXi Networking workloads up to 25% and ESXi 
Storage workloads up to 5%. From ESXi Networking perspective,
we were able to notice performance degradation in Netperf 
“TCP_STREAM_RECV large packets” Throughput tests up to 25%. 
In storage, we were able to notice performance degradation 
only in CPU cost metric up to 5%. 

After performing the bisect between kernel 5.14 and 5.13, we 
identified the root cause behavior to be a "memory allocation" 
of Mel's commit "44042b4498728f4376e84bae1ac8016d146d850b 
mm/page_alloc: allow high-order pages to be stored on the 
per-cpu lists").

To confirm this, we have backed out the above mentioned commit
from 5.14 & re-ran our tests and found that the performance was
on-par to 5.13 kernel. 

Immediate before commit: 43b02ba93b25b1caff7a3457fc5d005485e78da5
Mel's commit: 44042b4498728f4376e84bae1ac8016d146d850b
Mel’s commit git URL:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=44042b4498728f4376e84bae1ac8016d146d850b

To analyse the performance degradation further, we have collected
perf stats between Mel's commit & immediate before commit while 
running the Netperf benchmark and observed high cache-misses in
Mel's commit when compared to immediate before commit. Please 
find the perf-stats data when running netperf TCP_STREAM tests. 

Performance counter stats for 'system wide':
Immediate before commit:
cache-references - 5,343,078,363
cache-misses - 26,632,656 (0.498 % of all cache refs)

Mel's commit:
cache-references - 4,930,300,091
cache-misses - 319,495,743 (6.480 % of all cache refs)

We have synced-up with Mel offline and performed different 
experiments requested by him. He identified the root cause 
of the perf degradation and provided us a patch to validate. 
We have validated his patch and confirmed that it fixes our 
perf degradation and the perf #s are also on-par with kernel 5.13.

Performance data: 
TCP_STREAM_RECV Throughput: 
Immediate before commit: 16.394 Gbps 
Mel's commit: 15.465 Gbps 
Mel's patch: 16.461 Gbps

Patch URL: https://lore.kernel.org/all/
20220217002227.5739-1-mgorman@...hsingularity.net/

Since we have received a fix from Mel for the reported degradation
through offline, we wanted to document this in this community for 
reference.

Since we observe some performance degradation due to this commit
(44042b4498728f4376e84bae1ac8016d146d850b), could you please 
backport this patch/fix to kernel 5.14 release.

Manikandan Jagatheesan
Performance Engineering
VMware, Inc.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ