linux-kernel - Re: [mm, thp] 85b9f46e8e: vm-scalability.throughput -8.7% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.23.453.2010201110420.750649@chino.kir.corp.google.com>
Date:   Tue, 20 Oct 2020 11:19:50 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     "Huang, Ying" <ying.huang@...el.com>
cc:     kernel test robot <rong.a.chen@...el.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Jeremy Cline <jcline@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, feng.tang@...el.com, zhengjun.xing@...el.com
Subject: Re: [mm, thp] 85b9f46e8e: vm-scalability.throughput -8.7%
 regression

On Tue, 20 Oct 2020, Huang, Ying wrote:

> >> =========================================================================================
> >> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
> >>   gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/1T/lkp-skl-fpga01/lru-shm/vm-scalability/0x2006906
> >> 
> >> commit: 
> >>   dcdf11ee14 ("mm, shmem: add vmstat for hugepage fallback")
> >>   85b9f46e8e ("mm, thp: track fallbacks due to failed memcg charges separately")
> >> 
> >> dcdf11ee14413332 85b9f46e8ea451633ccd60a7d8c 
> >> ---------------- --------------------------- 
> >>        fail:runs  %reproduction    fail:runs
> >>            |             |             |    
> >>           1:4           24%           2:4     perf-profile.calltrace.cycles-pp.sync_regs.error_entry.do_access
> >>           3:4           53%           5:4     perf-profile.calltrace.cycles-pp.error_entry.do_access
> >>           9:4          -27%           8:4     perf-profile.children.cycles-pp.error_entry
> >>           4:4          -10%           4:4     perf-profile.self.cycles-pp.error_entry
> >>          %stddev     %change         %stddev
> >>              \          |                \  
> >>     477291            -9.1%     434041        vm-scalability.median
> >>   49791027            -8.7%   45476799        vm-scalability.throughput
> >>     223.67            +1.6%     227.36        vm-scalability.time.elapsed_time
> >>     223.67            +1.6%     227.36        vm-scalability.time.elapsed_time.max
> >>      50364 ±  6%     +24.1%      62482 ± 10%  vm-scalability.time.involuntary_context_switches
> >>       2237            +7.8%       2412        vm-scalability.time.percent_of_cpu_this_job_got
> >>       3084           +18.2%       3646        vm-scalability.time.system_time
> >>       1921            -4.2%       1839        vm-scalability.time.user_time
> >>      13.68            +2.2       15.86        mpstat.cpu.all.sys%
> >>      28535 ± 30%     -47.0%      15114 ± 79%  numa-numastat.node0.other_node
> >>     142734 ± 11%     -19.4%     115000 ± 17%  numa-meminfo.node0.AnonPages
> >>      11168 ±  3%      +8.8%      12150 ±  5%  numa-meminfo.node1.PageTables
> >>      76.00            -1.6%      74.75        vmstat.cpu.id
> >>       3626            -1.9%       3555        vmstat.system.cs
> >>    2214928 ±166%     -96.6%      75321 ±  7%  cpuidle.C1.usage
> >>     200981 ±  7%     -18.0%     164861 ±  7%  cpuidle.POLL.time
> >>      52675 ±  3%     -16.7%      43866 ± 10%  cpuidle.POLL.usage
> >>      35659 ± 11%     -19.4%      28754 ± 17%  numa-vmstat.node0.nr_anon_pages
> >>    1248014 ±  3%     +10.9%    1384236        numa-vmstat.node1.nr_mapped
> >>       2722 ±  4%     +10.6%       3011 ±  5%  numa-vmstat.node1.nr_page_table_pages
> >
> > I'm not sure that I'm reading this correctly, but I suspect that this just 
> > happens because of NUMA: memory affinity will obviously impact 
> > vm-scalability.throughput quite substantially, but I don't think the 
> > bisected commit can be to be blame.  Commit 85b9f46e8ea4 ("mm, thp: track 
> > fallbacks due to failed memcg charges separately") simply adds new 
> > count_vm_event() calls in a couple areas to track thp fallback due to 
> > memcg limits separate from fragmentation.
> >
> > It's likely a question about the testing methodology in general: for 
> > memory intensive benchmarks, I suggest it is configured in a manner that 
> > we can expect consistent memory access latency at the hardware level when 
> > running on a NUMA system.
> 
> So you think it's better to bind processes to NUMA node or CPU?  But we
> want to use this test case to capture NUMA/CPU placement/balance issue
> too.
> 

No, because binding to a specific socket may cause other performance 
"improvements" or "degradations" depending on how fragmented local memory 
is, or whether or not it's under memory pressure.  Is the system rebooted 
before testing so that we have a consistent state of memory availability 
and fragmentation across sockets?

> 0day solve the problem in another way.  We run the test case
> multiple-times and calculate the average and standard deviation, then
> compare.
> 

Depending on fragmentation or memory availability, any benchmark that 
assesses performance may be adversely affected if its results can be 
impacted by hugepage backing.