linux-kernel - Re: [mm, thp] 85b9f46e8e: vm-scalability.throughput -8.7% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.23.453.2010041157270.3597796@chino.kir.corp.google.com>
Date:   Sun, 4 Oct 2020 12:05:21 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     kernel test robot <rong.a.chen@...el.com>
cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Jeremy Cline <jcline@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...el.com
Subject: Re: [mm, thp] 85b9f46e8e: vm-scalability.throughput -8.7%
 regression

On Sun, 4 Oct 2020, kernel test robot wrote:

> Greeting,
> 
> FYI, we noticed a -8.7% regression of vm-scalability.throughput due to commit:
> 
> 
> commit: 85b9f46e8ea451633ccd60a7d8cacbfff9f34047 ("mm, thp: track fallbacks due to failed memcg charges separately")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> 
> in testcase: vm-scalability
> on test machine: 104 threads Skylake with 192G memory
> with following parameters:
> 
> 	runtime: 300s
> 	size: 1T
> 	test: lru-shm
> 	cpufreq_governor: performance
> 	ucode: 0x2006906
> 
> test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
> 
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <rong.a.chen@...el.com>
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> To reproduce:
> 
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         bin/lkp install job.yaml  # job file is attached in this email
>         bin/lkp run     job.yaml
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
>   gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/1T/lkp-skl-fpga01/lru-shm/vm-scalability/0x2006906
> 
> commit: 
>   dcdf11ee14 ("mm, shmem: add vmstat for hugepage fallback")
>   85b9f46e8e ("mm, thp: track fallbacks due to failed memcg charges separately")
> 
> dcdf11ee14413332 85b9f46e8ea451633ccd60a7d8c 
> ---------------- --------------------------- 
>        fail:runs  %reproduction    fail:runs
>            |             |             |    
>           1:4           24%           2:4     perf-profile.calltrace.cycles-pp.sync_regs.error_entry.do_access
>           3:4           53%           5:4     perf-profile.calltrace.cycles-pp.error_entry.do_access
>           9:4          -27%           8:4     perf-profile.children.cycles-pp.error_entry
>           4:4          -10%           4:4     perf-profile.self.cycles-pp.error_entry
>          %stddev     %change         %stddev
>              \          |                \  
>     477291            -9.1%     434041        vm-scalability.median
>   49791027            -8.7%   45476799        vm-scalability.throughput
>     223.67            +1.6%     227.36        vm-scalability.time.elapsed_time
>     223.67            +1.6%     227.36        vm-scalability.time.elapsed_time.max
>      50364 ±  6%     +24.1%      62482 ± 10%  vm-scalability.time.involuntary_context_switches
>       2237            +7.8%       2412        vm-scalability.time.percent_of_cpu_this_job_got
>       3084           +18.2%       3646        vm-scalability.time.system_time
>       1921            -4.2%       1839        vm-scalability.time.user_time
>      13.68            +2.2       15.86        mpstat.cpu.all.sys%
>      28535 ± 30%     -47.0%      15114 ± 79%  numa-numastat.node0.other_node
>     142734 ± 11%     -19.4%     115000 ± 17%  numa-meminfo.node0.AnonPages
>      11168 ±  3%      +8.8%      12150 ±  5%  numa-meminfo.node1.PageTables
>      76.00            -1.6%      74.75        vmstat.cpu.id
>       3626            -1.9%       3555        vmstat.system.cs
>    2214928 ±166%     -96.6%      75321 ±  7%  cpuidle.C1.usage
>     200981 ±  7%     -18.0%     164861 ±  7%  cpuidle.POLL.time
>      52675 ±  3%     -16.7%      43866 ± 10%  cpuidle.POLL.usage
>      35659 ± 11%     -19.4%      28754 ± 17%  numa-vmstat.node0.nr_anon_pages
>    1248014 ±  3%     +10.9%    1384236        numa-vmstat.node1.nr_mapped
>       2722 ±  4%     +10.6%       3011 ±  5%  numa-vmstat.node1.nr_page_table_pages

I'm not sure that I'm reading this correctly, but I suspect that this just 
happens because of NUMA: memory affinity will obviously impact 
vm-scalability.throughput quite substantially, but I don't think the 
bisected commit can be to be blame.  Commit 85b9f46e8ea4 ("mm, thp: track 
fallbacks due to failed memcg charges separately") simply adds new 
count_vm_event() calls in a couple areas to track thp fallback due to 
memcg limits separate from fragmentation.

It's likely a question about the testing methodology in general: for 
memory intensive benchmarks, I suggest it is configured in a manner that 
we can expect consistent memory access latency at the hardware level when 
running on a NUMA system.