linux-kernel - Re: [NUMA Balancing] e39bb6be9f: will-it-scale.per_thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20220406014147.GA64277@shbuild999.sh.intel.com>
Date:   Wed, 6 Apr 2022 09:41:47 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     kernel test robot <oliver.sang@...el.com>,
        Yang Shi <shy828301@...il.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        Rik van Riel <riel@...riel.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Peter Zijlstra <peterz@...radead.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Zi Yan <ziy@...dia.com>, Wei Xu <weixugc@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        zhongjiang-ali <zhongjiang-ali@...ux.alibaba.com>,
        Randy Dunlap <rdunlap@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        kernel test robot <lkp@...el.com>,
        "Huang, Ying" <ying.huang@...el.com>,
        Zhengjun Xing <zhengjun.xing@...ux.intel.com>,
        fengwei.yin@...el.com
Subject: Re: [NUMA Balancing] e39bb6be9f: will-it-scale.per_thread_ops 64.4%
 improvement

On Sat, Apr 02, 2022 at 04:50:05PM +0800, Feng Tang wrote:
> Hi Linus,
> 
> On Fri, Apr 01, 2022 at 09:35:24AM -0700, Linus Torvalds wrote:
> > On Fri, Apr 1, 2022 at 2:42 AM kernel test robot <oliver.sang@...el.com> wrote:
> > >
> > > FYI, we noticed a 64.4% improvement of will-it-scale.per_thread_ops due to commit:
> > > e39bb6be9f2b ("NUMA Balancing: add page promotion counter")
> > 
> > That looks odd and unlikely.
> > 
> > That commit only modifies some page counting statistics. Sure, it
> > could be another cache layout thing, and maybe it's due to the subtle
> > change in how NUMA_PAGE_MIGRATE gets counted, but it still looks a bit
> > odd.
> 
> We did a quick check about cache stuff by disabling HW cache prefetch
> completely (writing 0xf to MSR 0x1a4), and the performance change
> is almost gone:
> 
>   ee97347fe058d020     e39bb6be9f2b39a6dbaeff48436 
>   ----------------     --------------------------- 
>     134793            -1.4%     132867        will-it-scale.per_thread_ops
> 
> The test box is a Cascadelake machine with 4 nodes, and the similar trend
> is found on a 2 nodes machine, that the commit has 55% improvement with
> HW cache prefetch enabled, and has less than 1% change when disabled.
> 
> Though we still cannot pin-point the exact place affected.
 
We did more tests and debugs, and here are some updates:

* For the HW cache prefetcher, we narrowed down it to be related with
  'L2 cache prefetcher', and not the 'L2 adjacent cache line prefetcher'.
  We can't find any documents about the detail of the prefetcher, which
  make it hard to analyze how the performance is affected

* Debug shows the change is related with the struct 'mem_cgroup''s size
  change, that with commit ee97347fe058d020, its size is 4096, which turns
  to 4160 with commit e39bb6be9f2b3.
  - commit e39bb6be9f2b adds one counter 'PGPROMOTE_SUCCESS' and some
    code change, if we remove the code change and leave the counter,
    the 60% improvement remains.
  - revert e39bb6be9f2b, and only add 16 bytes padding inside 'mem_cgroup',
    the 60% change also remains. Debug patch is as below:

	diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
	index a68dce3873fcc..2bd56fb2e5b5f 100644
	--- a/include/linux/memcontrol.h
	+++ b/include/linux/memcontrol.h
	@@ -303,6 +303,8 @@ struct mem_cgroup {
		/* memory.stat */
		struct memcg_vmstats	vmstats;
	 
	+	unsigned long padding[2];
	+
		/* memory.events */
		atomic_long_t		memory_events[MEMCG_NR_MEMORY_EVENTS];

Thanks,
Feng

> Also per our experience, the patch changing vm statistics can easily
> trigger strange performance bumps for micro-benchmarks like will-it-scale,
> stress-ng etc.
> 
> Thanks,
> Feng
> 
> 
> >                Linus