linux-kernel - Re: [LKP] Re: [sched/numa] bb2dee337b: unixbench.score -11.2% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4977e67c975723c98a441e08cc9f001b69f5668e.camel@intel.com>
Date:   Fri, 20 May 2022 14:44:26 +0800
From:   Ying Huang <ying.huang@...el.com>
To:     Mel Gorman <mgorman@...hsingularity.net>,
        kernel test robot <oliver.sang@...el.com>
Cc:     0day robot <lkp@...el.com>, LKML <linux-kernel@...r.kernel.org>,
        lkp@...ts.01.org, fengwei.yin@...el.com,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Aubrey Li <aubrey.li@...ux.intel.com>, yu.c.chen@...el.com
Subject: Re: [LKP] Re: [sched/numa]  bb2dee337b:  unixbench.score -11.2%
 regression

On Thu, 2022-05-19 at 15:54 +0800, ying.huang@...el.com wrote:
> Hi, Mel,
> 
> On Wed, 2022-05-18 at 16:22 +0100, Mel Gorman wrote:
> > On Wed, May 18, 2022 at 05:24:14PM +0800, kernel test robot wrote:
> > > 
> > > 
> > > Greeting,
> > > 
> > > FYI, we noticed a -11.2% regression of unixbench.score due to commit:
> > > 
> > > 
> > > commit: bb2dee337bd7d314eb7c7627e1afd754f86566bc ("[PATCH 3/4] sched/numa: Apply imbalance limitations consistently")
> > > url: https://github.com/intel-lab-lkp/linux/commits/Mel-Gorman/Mitigate-inconsistent-NUMA-imbalance-behaviour/20220511-223233
> > > base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git d70522fc541224b8351ac26f4765f2c6268f8d72
> > > patch link: https://lore.kernel.org/lkml/20220511143038.4620-4-mgorman@techsingularity.net
> > > 
> > > in testcase: unixbench
> > > on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz with 256G memory
> > > with following parameters:
> > > 
> > > 	runtime: 300s
> > > 	nr_task: 1
> > > 	test: shell8
> > > 	cpufreq_governor: performance
> > > 	ucode: 0xd000331
> > > 
> > > test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
> > > test-url: https://github.com/kdlucas/byte-unixbench
> > 
> > I think what is happening for unixbench is that it prefers to run all
> > instances on a local node if possible. shell8 is creating 8 scripts,
> > each of which spawn more processes. The total number of tasks may exceed
> > the allowed imbalance at fork time of 16 tasks. Some spill over to a
> > remote node and as they are using files, some accesses are remote and it
> > suffers. It's not memory bandwidth bound but is sensitive to locality.
> > The stats somewhat support this idea
> > 
> > >      83590 ± 13%     -73.7%      21988 ± 32%  numa-meminfo.node0.AnonHugePages
> > >     225657 ± 18%     -58.0%      94847 ± 18%  numa-meminfo.node0.AnonPages
> > >     231652 ± 17%     -55.3%     103657 ± 16%  numa-meminfo.node0.AnonPages.max
> > >     234525 ± 17%     -55.5%     104341 ± 18%  numa-meminfo.node0.Inactive
> > >     234397 ± 17%     -55.5%     104267 ± 18%  numa-meminfo.node0.Inactive(anon)
> > >      11724 ±  7%     +17.5%      13781 ±  5%  numa-meminfo.node0.KernelStack
> > >       4472 ± 34%    +117.1%       9708 ± 31%  numa-meminfo.node0.PageTables
> > >      15239 ± 75%    +401.2%      76387 ± 10%  numa-meminfo.node1.AnonHugePages
> > >      67256 ± 63%    +206.3%     205994 ±  6%  numa-meminfo.node1.AnonPages
> > >      73568 ± 58%    +193.1%     215644 ±  6%  numa-meminfo.node1.AnonPages.max
> > >      75737 ± 53%    +183.9%     215053 ±  6%  numa-meminfo.node1.Inactive
> > >      75709 ± 53%    +183.9%     214971 ±  6%  numa-meminfo.node1.Inactive(anon)
> > >       3559 ± 42%    +187.1%      10216 ±  8%  numa-meminfo.node1.PageTables
> > 
> > There is less memory used on one node and more on the other so it's
> > getting split.
> 
> This makes sense.  I will also check CPU utilization per node to verify
> this directly.

I run this workload 3 times for the commit and its parent with mpstat
node statistics.

For the parent commit,

  "mpstat.node.0.usr%": [
    0.1396875,
    3.0806153846153848,
    0.05303030303030303
  ],
  "mpstat.node.0.sys%": [
    0.10515625,
    5.597692307692308,
    0.1340909090909091
  ],

  "mpstat.node.1.usr%": [
    3.1015625,
    0.1306153846153846,
    3.0275757575757574
  ],
  "mpstat.node.1.sys%": [
    5.66703125,
    0.11676923076923076,
    5.498181818181818
  ],

The difference between two nodes are quite large.

For the commit,

  "mpstat.node.0.usr%": [
    1.42109375,
    1.4725,
    1.5140625
  ],
  "mpstat.node.0.sys%": [
    3.00125,
    3.16390625,
    3.1284375
  ],

  "mpstat.node.1.usr%": [
    1.4909375,
    1.41609375,
    1.3740625
  ],
  "mpstat.node.1.sys%": [
    3.1671875,
    3.00109375,
    3.044375
  ],

The difference between 2 nodes reduces greatly.  So this proves your
theory directly.

Best Regards,
Huang, Ying


[snip]