[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20181009093534.GB13396@shao2-debian>
Date: Tue, 9 Oct 2018 17:35:34 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...riel.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Jirka Hladky <jhladky@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Linux-MM <linux-mm@...ck.org>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [LKP] [mm, sched/numa] efaffc5e40:
perf-bench-numa-mem.GB_per_thread 38.7% improvement
Greeting,
FYI, we noticed a 38.7% improvement of perf-bench-numa-mem.GB_per_thread due to commit:
commit: efaffc5e40aeced0bcb497ed7a0a5b8c14abfcdf ("mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: perf-bench-numa-mem
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 64G memory
with following parameters:
nr_threads: 2t
mem_proc: 300M
cpufreq_governor: performance
ucode: 0x42d
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mem_proc/nr_threads/rootfs/tbox_group/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.2/300M/2t/debian-x86_64-2018-04-03.cgz/ivb44/perf-bench-numa-mem/0x42d
commit:
6fd98e775f ("sched/numa: Avoid task migration for small NUMA improvement")
efaffc5e40 ("mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration")
6fd98e775f24fd41 efaffc5e40aeced0bcb497ed7a
---------------- --------------------------
%stddev %change %stddev
\ | \
0.85 ± 5% +38.7% 1.18 ± 6% perf-bench-numa-mem.GB_per_thread
0.15 +36.6% 0.20 ± 5% perf-bench-numa-mem.GB_sec_thread
14.04 +36.6% 19.18 ± 6% perf-bench-numa-mem.GB_sec_total
81.51 ± 5% +38.7% 113.07 ± 6% perf-bench-numa-mem.GB_total
6.84 -26.5% 5.02 ± 6% perf-bench-numa-mem.nsecs_byte_thread
34.74 ± 5% +13.4% 39.39 ± 5% perf-bench-numa-mem.time.system_time
1799 ± 6% +11.6% 2008 ± 5% perf-bench-numa-mem.time.voluntary_context_switches
53165 ± 4% +22.2% 64991 ± 18% interrupts.CAL:Function_call_interrupts
91155 +2.0% 92949 vmstat.system.in
410.00 ± 4% +34.1% 550.00 ± 7% slabinfo.file_lock_cache.active_objs
410.00 ± 4% +34.1% 550.00 ± 7% slabinfo.file_lock_cache.num_objs
17863 ± 5% -14.8% 15221 ± 11% numa-meminfo.node0.Mapped
457399 ± 33% +120.0% 1006181 ± 35% numa-meminfo.node1.Active
457399 ± 33% +120.0% 1006091 ± 35% numa-meminfo.node1.Active(anon)
308898 ± 35% +145.0% 756724 ± 36% numa-meminfo.node1.AnonHugePages
456984 ± 33% +120.1% 1006048 ± 35% numa-meminfo.node1.AnonPages
1101857 ± 14% +51.7% 1671401 ± 21% numa-meminfo.node1.MemUsed
4566 ± 5% -15.2% 3872 ± 11% numa-vmstat.node0.nr_mapped
367556 ± 3% -6.6% 343176 ± 4% numa-vmstat.node0.numa_local
116715 ± 33% +118.3% 254735 ± 33% numa-vmstat.node1.nr_active_anon
116570 ± 33% +118.4% 254606 ± 33% numa-vmstat.node1.nr_anon_pages
151.25 ± 35% +136.5% 357.75 ± 34% numa-vmstat.node1.nr_anon_transparent_hugepages
116715 ± 33% +118.2% 254724 ± 33% numa-vmstat.node1.nr_zone_active_anon
272539 +10.6% 301362 ± 7% numa-vmstat.node1.numa_hit
23658 +2.7% 24302 proc-vmstat.nr_slab_unreclaimable
614708 ± 6% +113.3% 1310981 ± 7% proc-vmstat.numa_pages_migrated
4541241 +17.7% 5343559 proc-vmstat.pgalloc_normal
3936210 ± 25% +35.1% 5318885 proc-vmstat.pgfree
52096 ± 15% +95.6% 101888 ± 11% proc-vmstat.pgmigrate_fail
614708 ± 6% +113.3% 1310981 ± 7% proc-vmstat.pgmigrate_success
7267 ± 26% +34.3% 9757 ± 2% proc-vmstat.thp_deferred_split_page
1.556e+10 ± 5% +33.6% 2.078e+10 ± 14% perf-stat.branch-instructions
8.441e+08 ± 3% +19.0% 1.004e+09 ± 5% perf-stat.cache-misses
1.737e+09 ± 6% +24.0% 2.154e+09 ± 5% perf-stat.cache-references
10.77 -20.3% 8.58 ± 15% perf-stat.cpi
1.486e+10 ± 5% +37.2% 2.038e+10 ± 13% perf-stat.dTLB-loads
1.729e+10 ± 9% +26.7% 2.19e+10 ± 3% perf-stat.dTLB-stores
9.068e+10 ± 5% +32.3% 1.2e+11 ± 11% perf-stat.instructions
0.09 +28.4% 0.12 ± 15% perf-stat.ipc
31.61 -5.0 26.61 ± 6% perf-stat.node-load-miss-rate%
4.24e+08 ± 8% +9.1% 4.627e+08 ± 4% perf-stat.node-load-misses
9.165e+08 ± 6% +39.6% 1.28e+09 ± 6% perf-stat.node-loads
7007 ± 4% -52.6% 3324 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg
23357 ± 13% -18.8% 18961 ± 4% sched_debug.cfs_rq:/.min_vruntime.max
2988 ± 7% -76.8% 693.68 ± 23% sched_debug.cfs_rq:/.min_vruntime.min
4011 ± 8% -128.9% -1159 sched_debug.cfs_rq:/.spread0.avg
20353 ± 15% -28.9% 14466 ± 25% sched_debug.cfs_rq:/.spread0.max
115.17 ± 4% -14.4% 98.57 ± 12% sched_debug.cfs_rq:/.util_est_enqueued.stddev
15213474 ± 84% -83.1% 2564813 ± 91% sched_debug.cpu.avg_idle.max
2126334 ± 85% -80.3% 419030 ± 65% sched_debug.cpu.avg_idle.stddev
0.00 ± 3% -11.0% 0.00 ± 7% sched_debug.cpu.next_balance.stddev
5690 ± 8% -19.7% 4568 ± 4% sched_debug.cpu.nr_switches.max
1001 ± 6% -12.8% 872.62 ± 3% sched_debug.cpu.nr_switches.stddev
8.17 ± 14% -8.2 0.00 perf-profile.calltrace.cycles-pp.waitid
4.42 ±101% -4.4 0.00 perf-profile.calltrace.cycles-pp.pipe_write.__vfs_write.vfs_write.ksys_write.do_syscall_64
6.25 ± 60% -4.3 1.92 ±173% perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.25 ± 60% -4.3 1.92 ±173% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.25 ± 60% -4.3 1.92 ±173% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.25 ± 60% -4.3 1.92 ±173% perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
6.25 ± 60% -4.2 2.08 ±173% perf-profile.calltrace.cycles-pp.filemap_map_pages.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
4.00 ±100% -4.0 0.00 perf-profile.calltrace.cycles-pp.arch_show_interrupts.seq_read.proc_reg_read.__vfs_read.vfs_read
4.00 ±100% -4.0 0.00 perf-profile.calltrace.cycles-pp.proc_reg_read.__vfs_read.vfs_read.ksys_read.do_syscall_64
4.00 ±100% -4.0 0.00 perf-profile.calltrace.cycles-pp.seq_read.proc_reg_read.__vfs_read.vfs_read.ksys_read
3.75 ±101% -3.8 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.waitid
3.75 ±101% -3.8 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.waitid
3.75 ±101% -3.8 0.00 perf-profile.calltrace.cycles-pp.__do_sys_waitid.do_syscall_64.entry_SYSCALL_64_after_hwframe.waitid
3.75 ±101% -3.8 0.00 perf-profile.calltrace.cycles-pp.kernel_waitid.__do_sys_waitid.do_syscall_64.entry_SYSCALL_64_after_hwframe.waitid
3.75 ±101% -3.8 0.00 perf-profile.calltrace.cycles-pp.do_wait.kernel_waitid.__do_sys_waitid.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.75 ±101% -3.8 0.00 perf-profile.calltrace.cycles-pp.d_invalidate.proc_flush_task.release_task.wait_consider_task.do_wait
3.75 ±101% -3.8 0.00 perf-profile.calltrace.cycles-pp.proc_flush_task.release_task.wait_consider_task.do_wait.kernel_waitid
3.75 ±101% -3.8 0.00 perf-profile.calltrace.cycles-pp.shrink_dcache_parent.d_invalidate.proc_flush_task.release_task.wait_consider_task
3.75 ±101% -3.8 0.00 perf-profile.calltrace.cycles-pp.wait_consider_task.do_wait.kernel_waitid.__do_sys_waitid.do_syscall_64
3.75 ±101% -3.8 0.00 perf-profile.calltrace.cycles-pp.release_task.wait_consider_task.do_wait.kernel_waitid.__do_sys_waitid
4.42 ±101% -2.5 1.92 ±173% perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.17 ±103% -2.2 1.92 ±173% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.00 ±100% -1.9 2.08 ±173% perf-profile.calltrace.cycles-pp.__vfs_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.17 ± 14% -8.2 0.00 perf-profile.children.cycles-pp.waitid
4.42 ±101% -4.4 0.00 perf-profile.children.cycles-pp.pipe_write
6.25 ± 60% -4.3 1.92 ±173% perf-profile.children.cycles-pp.ksys_mmap_pgoff
6.25 ± 60% -4.3 1.92 ±173% perf-profile.children.cycles-pp.vm_mmap_pgoff
6.25 ± 60% -4.3 1.92 ±173% perf-profile.children.cycles-pp.do_mmap
6.25 ± 60% -4.3 1.92 ±173% perf-profile.children.cycles-pp.mmap_region
6.25 ± 60% -4.2 2.08 ±173% perf-profile.children.cycles-pp.filemap_map_pages
4.00 ±100% -4.0 0.00 perf-profile.children.cycles-pp.arch_show_interrupts
4.00 ±100% -4.0 0.00 perf-profile.children.cycles-pp.proc_reg_read
3.75 ±101% -3.8 0.00 perf-profile.children.cycles-pp.__do_sys_waitid
3.75 ±101% -3.8 0.00 perf-profile.children.cycles-pp.kernel_waitid
3.75 ±101% -3.8 0.00 perf-profile.children.cycles-pp.do_wait
3.75 ±101% -3.8 0.00 perf-profile.children.cycles-pp.d_invalidate
3.75 ±101% -3.8 0.00 perf-profile.children.cycles-pp.proc_flush_task
3.75 ±101% -3.8 0.00 perf-profile.children.cycles-pp.shrink_dcache_parent
3.75 ±101% -3.8 0.00 perf-profile.children.cycles-pp.wait_consider_task
3.75 ±101% -3.8 0.00 perf-profile.children.cycles-pp.release_task
4.42 ±101% -2.5 1.92 ±173% perf-profile.children.cycles-pp.ksys_write
4.42 ±101% -2.5 1.92 ±173% perf-profile.children.cycles-pp.vfs_write
4.42 ±101% -2.5 1.92 ±173% perf-profile.children.cycles-pp.__vfs_write
4.17 ±103% -2.2 1.92 ±173% perf-profile.children.cycles-pp.path_openat
4.00 ±100% -1.9 2.08 ±173% perf-profile.children.cycles-pp.__vfs_read
4.00 ±100% -1.9 2.08 ±173% perf-profile.children.cycles-pp.seq_read
4.17 ±103% -2.1 2.08 ±173% perf-profile.self.cycles-pp.filemap_map_pages
perf-bench-numa-mem.nsecs_byte_thread
7.5 +-+-------------------------------------------------------------------+
| .+ .+. + .+ .+ .+ |
7 +-+.+. + .+ +.+. .+.+. + .+.+. + .+.+.+. + .+.+. .|
| +.+ + +.+ + +.+ +..+.+ |
6.5 +-+ |
| |
6 +-+ |
| |
5.5 +-+ O |
| O O O O O O |
5 +-+ O O |
| O O O O O O |
4.5 O-+ O O O |
| |
4 +-+-------------------------------------------------------------------+
perf-bench-numa-mem.GB_sec_thread
0.23 +-+------------------------------------------------------------------+
0.22 +-+ O O O |
O O O O |
0.21 +-+ O O |
0.2 +-+ O O O |
| O O O |
0.19 +-O O O |
0.18 +-+ O |
0.17 +-+ |
| |
0.16 +-+ |
0.15 +-+ +.+. .+.|
|.+.+. .+.+.+. .+..+.+.+.+. .+..+.+.+. .+.+.+..+. + +.+..+.+ |
0.14 +-+ +. +.+ + + + |
0.13 +-+------------------------------------------------------------------+
perf-bench-numa-mem.GB_sec_total
22 +-+--------------------------------------------------------------------+
21 +-+ O O O |
O O O O O |
20 +-+ O O |
19 +-+ O O |
| O O O O O O |
18 +-+ |
17 +-+ O |
16 +-+ |
| |
15 +-+ |
14 +-+ .+.+. .+.. .+..+. .+..+. .+..+.+. .+..+.+.|
| +.+..+ +..+. .+.+ +.+.+ +.+.+ +.+.+ + |
13 +-+ + |
12 +-+--------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-4.19.0-rc5-00246-gefaffc5" of type "text/plain" (167709 bytes)
View attachment "job-script" of type "text/plain" (7002 bytes)
View attachment "job.yaml" of type "text/plain" (4588 bytes)
View attachment "reproduce" of type "text/plain" (323 bytes)
Powered by blists - more mailing lists