lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 20 Jun 2024 14:07:45 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
 Andrew Morton <akpm@...ux-foundation.org>, "Huang, Ying"
 <ying.huang@...el.com>, David Hildenbrand <david@...hat.com>,
 John Hubbard <jhubbard@...dia.com>, Kefeng Wang
 <wangkefeng.wang@...wei.com>, Mel Gorman <mgorman@...hsingularity.net>,
 Ryan Roberts <ryan.roberts@....com>, linux-mm@...ck.org,
 feng.tang@...el.com, fengwei.yin@...el.com
Subject: Re: [linus:master] [mm] d2136d749d: vm-scalability.throughput -7.1%
 regression



On 2024/6/20 10:39, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a -7.1% regression of vm-scalability.throughput on:
> 
> 
> commit: d2136d749d76af980b3accd72704eea4eab625bd ("mm: support multi-size THP numa balancing")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> [still regression on linus/master 92e5605a199efbaee59fb19e15d6cc2103a04ec2]
> 
> 
> testcase: vm-scalability
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> parameters:
> 
> 	runtime: 300s
> 	size: 512G
> 	test: anon-cow-rand-hugetlb
> 	cpufreq_governor: performance

Thanks for reporting. IIUC numa balancing will not scan hugetlb VMA, I'm 
not sure how this patch affects the performance of hugetlb cow, but let 
me try to reproduce it.


> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202406201010.a1344783-oliver.sang@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240620/202406201010.a1344783-oliver.sang@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
>    gcc-13/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s/512G/lkp-icl-2sp2/anon-cow-rand-hugetlb/vm-scalability
> 
> commit:
>    6b0ed7b3c7 ("mm: factor out the numa mapping rebuilding into a new helper")
>    d2136d749d ("mm: support multi-size THP numa balancing")
> 
> 6b0ed7b3c77547d2 d2136d749d76af980b3accd7270
> ---------------- ---------------------------
>           %stddev     %change         %stddev
>               \          |                \
>       12.02            -1.3       10.72 ±  4%  mpstat.cpu.all.sys%
>     1228757            +3.0%    1265679        proc-vmstat.pgfault
>     7392513            -7.1%    6865649        vm-scalability.throughput
>       17356            +9.4%      18986        vm-scalability.time.user_time
>        0.32 ± 22%     -36.9%       0.20 ± 17%  sched_debug.cfs_rq:/.h_nr_running.stddev
>       28657 ± 86%     -90.8%       2640 ± 19%  sched_debug.cfs_rq:/.load.stddev
>        0.28 ± 35%     -52.1%       0.13 ± 29%  sched_debug.cfs_rq:/.nr_running.stddev
>      299.88 ± 27%     -39.6%     181.04 ± 23%  sched_debug.cfs_rq:/.runnable_avg.stddev
>      284.88 ± 32%     -44.0%     159.65 ± 27%  sched_debug.cfs_rq:/.util_avg.stddev
>        0.32 ± 22%     -37.2%       0.20 ± 17%  sched_debug.cpu.nr_running.stddev
>   1.584e+10 ±  2%      -6.9%  1.476e+10 ±  3%  perf-stat.i.branch-instructions
>    11673151 ±  3%      -6.3%   10935072 ±  4%  perf-stat.i.branch-misses
>        4.90            +3.5%       5.07        perf-stat.i.cpi
>      333.40            +7.5%     358.32        perf-stat.i.cycles-between-cache-misses
>   6.787e+10 ±  2%      -6.8%  6.324e+10 ±  3%  perf-stat.i.instructions
>        0.25            -6.2%       0.24        perf-stat.i.ipc
>        4.19            +7.5%       4.51        perf-stat.overall.cpi
>      323.02            +7.4%     346.94        perf-stat.overall.cycles-between-cache-misses
>        0.24            -7.0%       0.22        perf-stat.overall.ipc
>   1.549e+10 ±  2%      -6.8%  1.444e+10 ±  3%  perf-stat.ps.branch-instructions
>   6.634e+10 ±  2%      -6.7%  6.186e+10 ±  3%  perf-stat.ps.instructions
>       17.33 ± 77%     -10.6        6.72 ±169%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
>       17.30 ± 77%     -10.6        6.71 ±169%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
>       17.30 ± 77%     -10.6        6.71 ±169%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
>       17.28 ± 77%     -10.6        6.70 ±169%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
>       17.27 ± 77%     -10.6        6.70 ±169%  perf-profile.calltrace.cycles-pp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>       13.65 ± 76%      -8.4        5.29 ±168%  perf-profile.calltrace.cycles-pp.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>       13.37 ± 76%      -8.2        5.18 ±168%  perf-profile.calltrace.cycles-pp.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_fault.do_user_addr_fault
>       13.35 ± 76%      -8.2        5.18 ±168%  perf-profile.calltrace.cycles-pp.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fault.handle_mm_fault
>       13.23 ± 76%      -8.1        5.13 ±168%  perf-profile.calltrace.cycles-pp.copy_mc_enhanced_fast_string.copy_subpage.copy_user_large_folio.hugetlb_wp.hugetlb_fault
>        3.59 ± 78%      -2.2        1.39 ±169%  perf-profile.calltrace.cycles-pp.__mutex_lock.hugetlb_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>       17.35 ± 77%     -10.6        6.73 ±169%  perf-profile.children.cycles-pp.asm_exc_page_fault
>       17.32 ± 77%     -10.6        6.72 ±168%  perf-profile.children.cycles-pp.do_user_addr_fault
>       17.32 ± 77%     -10.6        6.72 ±168%  perf-profile.children.cycles-pp.exc_page_fault
>       17.30 ± 77%     -10.6        6.71 ±168%  perf-profile.children.cycles-pp.handle_mm_fault
>       17.28 ± 77%     -10.6        6.70 ±169%  perf-profile.children.cycles-pp.hugetlb_fault
>       13.65 ± 76%      -8.4        5.29 ±168%  perf-profile.children.cycles-pp.hugetlb_wp
>       13.37 ± 76%      -8.2        5.18 ±168%  perf-profile.children.cycles-pp.copy_user_large_folio
>       13.35 ± 76%      -8.2        5.18 ±168%  perf-profile.children.cycles-pp.copy_subpage
>       13.34 ± 76%      -8.2        5.17 ±168%  perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string
>        3.59 ± 78%      -2.2        1.39 ±169%  perf-profile.children.cycles-pp.__mutex_lock
>       13.24 ± 76%      -8.1        5.13 ±168%  perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ