[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR05MB8448FF2F53343CAA8AF2DD09AF3E9@PH0PR05MB8448.namprd05.prod.outlook.com>
Date: Fri, 25 Feb 2022 12:48:09 +0000
From: Manikandan Jagatheesan <mjagatheesan@...are.com>
To: Mel Gorman <mgorman@...hsingularity.net>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "ziy@...dia.com" <ziy@...dia.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"mhocko@...nel.org" <mhocko@...nel.org>,
"brouer@...hat.com" <brouer@...hat.com>,
Rajender M <manir@...are.com>,
Abdul Anshad Azeez <aazees@...are.com>,
Yiu Cho Lau <lauyiuch@...are.com>
Subject: [Linux Kernel 5.14 GA] ESXi Performance degradation
As part of VMware's performance regression testing for Linux
Kernel upstream releases, we have evaluated the performance
of Linux kernel 5.14 against the 5.13 release and would like
to share the below observation. We have noticed performance
degradation in ESXi Networking workloads up to 25% and ESXi
Storage workloads up to 5%. From ESXi Networking perspective,
we were able to notice performance degradation in Netperf
“TCP_STREAM_RECV large packets” Throughput tests up to 25%.
In storage, we were able to notice performance degradation
only in CPU cost metric up to 5%.
After performing the bisect between kernel 5.14 and 5.13, we
identified the root cause behavior to be a "memory allocation"
of Mel's commit "44042b4498728f4376e84bae1ac8016d146d850b
mm/page_alloc: allow high-order pages to be stored on the
per-cpu lists").
To confirm this, we have backed out the above mentioned commit
from 5.14 & re-ran our tests and found that the performance was
on-par to 5.13 kernel.
Immediate before commit: 43b02ba93b25b1caff7a3457fc5d005485e78da5
Mel's commit: 44042b4498728f4376e84bae1ac8016d146d850b
Mel’s commit git URL:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=44042b4498728f4376e84bae1ac8016d146d850b
To analyse the performance degradation further, we have collected
perf stats between Mel's commit & immediate before commit while
running the Netperf benchmark and observed high cache-misses in
Mel's commit when compared to immediate before commit. Please
find the perf-stats data when running netperf TCP_STREAM tests.
Performance counter stats for 'system wide':
Immediate before commit:
cache-references - 5,343,078,363
cache-misses - 26,632,656 (0.498 % of all cache refs)
Mel's commit:
cache-references - 4,930,300,091
cache-misses - 319,495,743 (6.480 % of all cache refs)
We have synced-up with Mel offline and performed different
experiments requested by him. He identified the root cause
of the perf degradation and provided us a patch to validate.
We have validated his patch and confirmed that it fixes our
perf degradation and the perf #s are also on-par with kernel 5.13.
Performance data:
TCP_STREAM_RECV Throughput:
Immediate before commit: 16.394 Gbps
Mel's commit: 15.465 Gbps
Mel's patch: 16.461 Gbps
Patch URL: https://lore.kernel.org/all/
20220217002227.5739-1-mgorman@...hsingularity.net/
Since we have received a fix from Mel for the reported degradation
through offline, we wanted to document this in this community for
reference.
Since we observe some performance degradation due to this commit
(44042b4498728f4376e84bae1ac8016d146d850b), could you please
backport this patch/fix to kernel 5.14 release.
Manikandan Jagatheesan
Performance Engineering
VMware, Inc.
Powered by blists - more mailing lists