[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220312154321.GC1189@xsang-OptiPlex-9020>
Date: Sat, 12 Mar 2022 23:43:21 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: 0day robot <lkp@...el.com>, Eric Dumazet <edumazet@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Michal Hocko <mhocko@...nel.org>,
Shakeel Butt <shakeelb@...gle.com>,
Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
Hugh Dickins <hughd@...gle.com>,
David Rientjes <rientjes@...gle.com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
ying.huang@...el.com, feng.tang@...el.com,
zhengjun.xing@...ux.intel.com, fengwei.yin@...el.com,
Eric Dumazet <eric.dumazet@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-mm <linux-mm@...ck.org>
Subject: [mm/page_alloc] 8212a964ee: vm-scalability.throughput 30.5%
improvement
Greeting,
FYI, we noticed a 30.5% improvement of vm-scalability.throughput due to commit:
commit: 8212a964ee020471104e34dce7029dec33c218a9 ("Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held")
url: https://github.com/0day-ci/linux/commits/Mel-Gorman/Re-PATCH-v2-mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held/20220309-203504
patch link: https://lore.kernel.org/lkml/20220309123245.GI15701@techsingularity.net
in testcase: vm-scalability
on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory
with following parameters:
runtime: 300s
size: 512G
test: anon-w-rand-hugetlb
cpufreq_governor: performance
ucode: 0xd000331
test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/512G/lkp-icl-2sp5/anon-w-rand-hugetlb/vm-scalability/0xd000331
commit:
v5.17-rc7
8212a964ee ("mm/page_alloc: call check_new_pages() while zone spinlock is not held")
v5.17-rc7 8212a964ee020471104e34dce70
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.00 ± 5% -7.4% 0.00 ± 4% vm-scalability.free_time
47190 ± 2% +25.5% 59208 ± 2% vm-scalability.median
6352467 ± 2% +30.5% 8293110 ± 2% vm-scalability.throughput
218.97 ± 2% -18.7% 177.98 ± 3% vm-scalability.time.elapsed_time
218.97 ± 2% -18.7% 177.98 ± 3% vm-scalability.time.elapsed_time.max
121357 ± 7% -24.9% 91162 ± 10% vm-scalability.time.involuntary_context_switches
11226 -5.2% 10641 vm-scalability.time.percent_of_cpu_this_job_got
2311 ± 3% -35.2% 1496 ± 6% vm-scalability.time.system_time
22275 ± 2% -21.7% 17443 ± 3% vm-scalability.time.user_time
9358 ± 3% -13.1% 8130 vm-scalability.time.voluntary_context_switches
255.23 -16.1% 214.10 ± 2% uptime.boot
2593 +6.8% 2771 ± 5% vmstat.system.cs
11.51 ± 7% +4.5 16.05 ± 8% mpstat.cpu.all.idle%
8.48 ± 2% -1.6 6.84 ± 3% mpstat.cpu.all.sys%
727581 ± 12% -17.2% 602238 ± 6% numa-numastat.node1.local_node
798037 ± 8% -13.3% 691955 ± 6% numa-numastat.node1.numa_hit
5806206 ± 17% +26.7% 7356010 ± 10% turbostat.C1E
9.55 ± 26% +5.9 15.48 ± 9% turbostat.C1E%
59854751 ± 2% -17.8% 49202950 ± 3% turbostat.IRQ
42804 ± 6% -54.9% 19301 ± 21% meminfo.Active
41832 ± 7% -56.2% 18325 ± 23% meminfo.Active(anon)
63386 ± 6% -26.6% 46542 ± 3% meminfo.Mapped
137758 -25.5% 102591 ± 3% meminfo.Shmem
36980 ± 5% -62.6% 13823 ± 29% numa-meminfo.node1.Active
36495 ± 5% -63.9% 13173 ± 30% numa-meminfo.node1.Active(anon)
19454 ± 26% -57.7% 8233 ± 33% numa-meminfo.node1.Mapped
65896 ± 38% -67.8% 21189 ± 13% numa-meminfo.node1.Shmem
9185 ± 6% -64.7% 3246 ± 31% numa-vmstat.node1.nr_active_anon
4769 ± 26% -54.5% 2171 ± 32% numa-vmstat.node1.nr_mapped
16462 ± 37% -68.1% 5258 ± 14% numa-vmstat.node1.nr_shmem
9185 ± 6% -64.7% 3246 ± 31% numa-vmstat.node1.nr_zone_active_anon
10436 ± 5% -56.2% 4570 ± 23% proc-vmstat.nr_active_anon
69290 +1.3% 70203 proc-vmstat.nr_anon_pages
1717695 +4.5% 1794462 proc-vmstat.nr_dirty_background_threshold
3439592 +4.5% 3593312 proc-vmstat.nr_dirty_threshold
640952 -1.4% 632171 proc-vmstat.nr_file_pages
17356030 +4.4% 18125242 proc-vmstat.nr_free_pages
93258 -2.4% 91059 proc-vmstat.nr_inactive_anon
16187 ± 5% -26.4% 11911 ± 2% proc-vmstat.nr_mapped
34477 ± 2% -25.6% 25663 ± 4% proc-vmstat.nr_shmem
10436 ± 5% -56.2% 4570 ± 23% proc-vmstat.nr_zone_active_anon
93258 -2.4% 91059 proc-vmstat.nr_zone_inactive_anon
32151 ± 16% -61.0% 12542 ± 13% proc-vmstat.numa_hint_faults
21214 ± 22% -86.0% 2964 ± 45% proc-vmstat.numa_hint_faults_local
1598135 -10.9% 1423466 proc-vmstat.numa_hit
1481881 -11.8% 1307551 proc-vmstat.numa_local
117279 -1.2% 115916 proc-vmstat.numa_other
555445 ± 16% -53.2% 260178 ± 53% proc-vmstat.numa_pte_updates
93889 ± 4% -74.3% 24113 ± 7% proc-vmstat.pgactivate
1599893 -11.0% 1424527 proc-vmstat.pgalloc_normal
1594626 -14.2% 1368920 proc-vmstat.pgfault
1609987 -20.8% 1275284 proc-vmstat.pgfree
49893 -14.8% 42496 ± 5% proc-vmstat.pgreuse
15.23 ± 2% -7.8% 14.04 perf-stat.i.MPKI
1.348e+10 +22.0% 1.645e+10 ± 3% perf-stat.i.branch-instructions
6.957e+08 ± 2% +22.4% 8.517e+08 ± 3% perf-stat.i.cache-misses
7.117e+08 ± 2% +22.4% 8.71e+08 ± 3% perf-stat.i.cache-references
7.86 ± 2% -29.0% 5.58 ± 6% perf-stat.i.cpi
3.739e+11 -5.1% 3.549e+11 perf-stat.i.cpu-cycles
550.18 ± 3% -22.2% 427.87 ± 5% perf-stat.i.cycles-between-cache-misses
1.605e+10 +22.1% 1.959e+10 ± 3% perf-stat.i.dTLB-loads
0.02 ± 3% -0.0 0.01 ± 4% perf-stat.i.dTLB-store-miss-rate%
921125 ± 2% -4.6% 878569 perf-stat.i.dTLB-store-misses
5.803e+09 +22.0% 7.078e+09 ± 3% perf-stat.i.dTLB-stores
5.665e+10 +22.0% 6.911e+10 ± 3% perf-stat.i.instructions
0.16 ± 3% +26.1% 0.20 ± 3% perf-stat.i.ipc
2.92 -5.1% 2.77 perf-stat.i.metric.GHz
123.32 ± 16% +158.4% 318.61 ± 22% perf-stat.i.metric.K/sec
286.92 +21.8% 349.59 ± 3% perf-stat.i.metric.M/sec
6641 +4.8% 6957 ± 2% perf-stat.i.minor-faults
586608 ± 12% +36.4% 800024 ± 7% perf-stat.i.node-loads
26.79 ± 4% -10.5 16.31 ± 12% perf-stat.i.node-store-miss-rate%
1.785e+08 ± 2% -27.7% 1.291e+08 ± 7% perf-stat.i.node-store-misses
5.131e+08 ± 3% +39.8% 7.172e+08 ± 5% perf-stat.i.node-stores
6643 +4.8% 6959 ± 2% perf-stat.i.page-faults
0.02 ± 18% -0.0 0.01 ± 4% perf-stat.overall.branch-miss-rate%
6.66 ± 2% -22.5% 5.16 ± 3% perf-stat.overall.cpi
539.35 ± 2% -22.7% 416.69 ± 3% perf-stat.overall.cycles-between-cache-misses
0.02 ± 3% -0.0 0.01 ± 3% perf-stat.overall.dTLB-store-miss-rate%
0.15 ± 2% +29.1% 0.19 ± 3% perf-stat.overall.ipc
25.88 ± 4% -10.6 15.28 ± 10% perf-stat.overall.node-store-miss-rate%
1.325e+10 ± 2% +22.3% 1.622e+10 ± 3% perf-stat.ps.branch-instructions
6.88e+08 ± 2% +22.7% 8.444e+08 ± 3% perf-stat.ps.cache-misses
7.043e+08 ± 2% +22.7% 8.638e+08 ± 3% perf-stat.ps.cache-references
3.708e+11 -5.2% 3.515e+11 perf-stat.ps.cpu-cycles
1.577e+10 ± 2% +22.4% 1.931e+10 ± 3% perf-stat.ps.dTLB-loads
910623 ± 2% -4.6% 868700 perf-stat.ps.dTLB-store-misses
5.701e+09 ± 2% +22.3% 6.975e+09 ± 3% perf-stat.ps.dTLB-stores
5.569e+10 ± 2% +22.3% 6.813e+10 ± 3% perf-stat.ps.instructions
6716 +4.8% 7038 perf-stat.ps.minor-faults
595302 ± 11% +37.2% 816710 ± 8% perf-stat.ps.node-loads
1.769e+08 ± 2% -27.8% 1.277e+08 ± 7% perf-stat.ps.node-store-misses
5.071e+08 ± 3% +40.3% 7.113e+08 ± 5% perf-stat.ps.node-stores
6717 +4.8% 7039 perf-stat.ps.page-faults
0.00 +0.8 0.80 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages
0.00 +0.8 0.80 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page
0.00 +0.8 0.83 ± 8% perf-profile.calltrace.cycles-pp.rmqueue_bulk.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page
0.00 +0.8 0.84 ± 8% perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory
0.00 +0.8 0.84 ± 8% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page
0.00 +0.8 0.84 ± 8% perf-profile.calltrace.cycles-pp.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages
0.00 +0.9 0.85 ± 8% perf-profile.calltrace.cycles-pp.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.__mmap
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.hugetlbfs_file_mmap.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region.do_mmap.vm_mmap_pgoff
0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region.do_mmap
60.28 ± 5% +4.7 64.98 ± 2% perf-profile.calltrace.cycles-pp.do_rw_once
0.09 ± 8% +0.0 0.11 ± 9% perf-profile.children.cycles-pp.task_tick_fair
0.14 ± 7% +0.0 0.17 ± 5% perf-profile.children.cycles-pp.scheduler_tick
0.20 ± 9% +0.0 0.24 ± 3% perf-profile.children.cycles-pp.tick_sched_timer
0.19 ± 9% +0.0 0.24 ± 4% perf-profile.children.cycles-pp.tick_sched_handle
0.19 ± 9% +0.0 0.23 ± 4% perf-profile.children.cycles-pp.update_process_times
0.24 ± 8% +0.0 0.29 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.40 ± 8% +0.1 0.45 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.39 ± 7% +0.1 0.45 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
0.26 ± 71% +0.6 0.86 ± 8% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.__mmap
0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.ksys_mmap_pgoff
0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.hugetlbfs_file_mmap
0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.hugetlb_reserve_pages
0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.hugetlb_acct_memory
0.27 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.alloc_surplus_huge_page
0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.vm_mmap_pgoff
0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.do_mmap
0.28 ± 71% +0.6 0.88 ± 8% perf-profile.children.cycles-pp.mmap_region
0.55 ± 44% +0.6 1.16 ± 9% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.55 ± 44% +0.6 1.16 ± 9% perf-profile.children.cycles-pp.do_syscall_64
0.12 ± 71% +0.7 0.85 ± 8% perf-profile.children.cycles-pp.alloc_fresh_huge_page
0.03 ± 70% +0.8 0.84 ± 8% perf-profile.children.cycles-pp.alloc_buddy_huge_page
0.04 ± 71% +0.8 0.84 ± 8% perf-profile.children.cycles-pp.get_page_from_freelist
0.04 ± 71% +0.8 0.84 ± 8% perf-profile.children.cycles-pp.__alloc_pages
0.00 +0.8 0.82 ± 8% perf-profile.children.cycles-pp._raw_spin_lock
0.00 +0.8 0.83 ± 8% perf-profile.children.cycles-pp.rmqueue_bulk
0.26 ± 71% +0.6 0.86 ± 8% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/lkp@lists.01.org
Thanks,
Oliver Sang
View attachment "config-5.17.0-rc7-00001-g8212a964ee02" of type "text/plain" (162152 bytes)
View attachment "job-script" of type "text/plain" (8251 bytes)
View attachment "job.yaml" of type "text/plain" (5542 bytes)
View attachment "reproduce" of type "text/plain" (2052 bytes)
Powered by blists - more mailing lists