[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220402085005.GC32311@shbuild999.sh.intel.com>
Date: Sat, 2 Apr 2022 16:50:05 +0800
From: Feng Tang <feng.tang@...el.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: kernel test robot <oliver.sang@...el.com>,
Yang Shi <shy828301@...il.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
Johannes Weiner <hannes@...xchg.org>,
Oscar Salvador <osalvador@...e.de>,
Michal Hocko <mhocko@...e.com>,
Rik van Riel <riel@...riel.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Peter Zijlstra <peterz@...radead.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Zi Yan <ziy@...dia.com>, Wei Xu <weixugc@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
zhongjiang-ali <zhongjiang-ali@...ux.alibaba.com>,
Randy Dunlap <rdunlap@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
kernel test robot <lkp@...el.com>,
"Huang, Ying" <ying.huang@...el.com>,
Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
fengwei.yin@...el.com
Subject: Re: [NUMA Balancing] e39bb6be9f: will-it-scale.per_thread_ops 64.4%
improvement
Hi Linus,
On Fri, Apr 01, 2022 at 09:35:24AM -0700, Linus Torvalds wrote:
> On Fri, Apr 1, 2022 at 2:42 AM kernel test robot <oliver.sang@...el.com> wrote:
> >
> > FYI, we noticed a 64.4% improvement of will-it-scale.per_thread_ops due to commit:
> > e39bb6be9f2b ("NUMA Balancing: add page promotion counter")
>
> That looks odd and unlikely.
>
> That commit only modifies some page counting statistics. Sure, it
> could be another cache layout thing, and maybe it's due to the subtle
> change in how NUMA_PAGE_MIGRATE gets counted, but it still looks a bit
> odd.
We did a quick check about cache stuff by disabling HW cache prefetch
completely (writing 0xf to MSR 0x1a4), and the performance change
is almost gone:
ee97347fe058d020 e39bb6be9f2b39a6dbaeff48436
---------------- ---------------------------
134793 -1.4% 132867 will-it-scale.per_thread_ops
The test box is a Cascadelake machine with 4 nodes, and the similar trend
is found on a 2 nodes machine, that the commit has 55% improvement with
HW cache prefetch enabled, and has less than 1% change when disabled.
Though we still cannot pin-point the exact place affected.
Also per our experience, the patch changing vm statistics can easily
trigger strange performance bumps for micro-benchmarks like will-it-scale,
stress-ng etc.
Thanks,
Feng
> Linus
Powered by blists - more mailing lists