[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210107053314.GB22733@xsang-OptiPlex-9020>
Date: Thu, 7 Jan 2021 13:33:14 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Daniel Jordan <daniel.m.jordan@...cle.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Josh Triplett <josh@...htriplett.org>,
Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
Alex Williamson <alex.williamson@...hat.com>,
Dan Williams <dan.j.williams@...el.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
David Hildenbrand <david@...hat.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
Jason Gunthorpe <jgg@...pe.ca>,
Jonathan Corbet <corbet@....net>,
Kirill Tkhai <ktkhai@...tuozzo.com>,
Michal Hocko <mhocko@...nel.org>, Pavel Machek <pavel@....cz>,
Pavel Tatashin <pasha.tatashin@...een.com>,
Peter Zijlstra <peterz@...radead.org>,
Randy Dunlap <rdunlap@...radead.org>,
Robert Elliott <elliott@....com>,
Shile Zhang <shile.zhang@...ux.alibaba.com>,
Steffen Klassert <steffen.klassert@...unet.com>,
Steven Sistare <steven.sistare@...cle.com>,
Tejun Heo <tj@...nel.org>, Zi Yan <ziy@...dia.com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
zhengjun.xing@...el.com
Subject: [mm] e44431498f: will-it-scale.per_thread_ops 26.8% improvement
Greeting,
FYI, we noticed a 26.8% improvement of will-it-scale.per_thread_ops due to commit:
commit: e44431498f5fbf427f139aa413cf381b4fa3a600 ("mm: parallelize deferred_init_memmap()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory
with following parameters:
nr_task: 16
mode: thread
test: page_fault1
cpufreq_governor: performance
ucode: 0x42e
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/thread/16/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/page_fault1/will-it-scale/0x42e
commit:
89c7c4022d ("mm: don't track number of pages during deferred initialization")
e44431498f ("mm: parallelize deferred_init_memmap()")
89c7c4022dfccf0c e44431498f5fbf427f139aa413c
---------------- ---------------------------
%stddev %change %stddev
\ | \
2605430 +26.8% 3303797 will-it-scale.16.threads
69.29 +1.5% 70.34 will-it-scale.16.threads_idle
162839 +26.8% 206486 will-it-scale.per_thread_ops
2605430 +26.8% 3303797 will-it-scale.workload
6627328 ± 6% +154.3% 16854016 ± 5% meminfo.DirectMap2M
3740 ± 91% +114.4% 8017 ± 40% numa-meminfo.node1.Inactive
3740 ± 91% +114.4% 8017 ± 40% numa-meminfo.node1.Inactive(anon)
12157 ± 5% -9.5% 11005 ± 3% slabinfo.pde_opener.active_objs
12157 ± 5% -9.5% 11005 ± 3% slabinfo.pde_opener.num_objs
34869 ± 21% -44.3% 19424 softirqs.CPU23.SCHED
143557 ± 8% -12.2% 125985 ± 4% softirqs.CPU38.TIMER
69.00 +1.4% 70.00 vmstat.cpu.id
5105 +21.3% 6193 vmstat.system.cs
1750980 +13.8% 1993438 ± 2% numa-numastat.node0.local_node
1762715 +14.5% 2019056 ± 2% numa-numastat.node0.numa_hit
2017488 +28.9% 2599574 ± 2% numa-numastat.node1.local_node
2049532 +27.7% 2617670 ± 2% numa-numastat.node1.numa_hit
18100 +18.5% 21447 sched_debug.cpu.nr_switches.avg
34161 ± 9% +27.4% 43526 ± 7% sched_debug.cpu.nr_switches.max
9041 ± 5% +32.4% 11972 ± 3% sched_debug.cpu.nr_switches.stddev
-48.79 -49.1% -24.83 sched_debug.cpu.nr_uninterruptible.min
366.25 ± 6% +10.8% 405.75 ± 7% numa-vmstat.node0.nr_page_table_pages
1313119 +18.4% 1554083 ± 4% numa-vmstat.node0.numa_hit
1297715 ± 2% +12.2% 1455868 ± 8% numa-vmstat.node0.numa_local
15404 ± 67% +537.6% 98214 ± 68% numa-vmstat.node0.numa_other
944.00 ± 89% +111.4% 1995 ± 41% numa-vmstat.node1.nr_inactive_anon
944.00 ± 89% +111.4% 1995 ± 41% numa-vmstat.node1.nr_zone_inactive_anon
1534775 +10.6% 1696916 ± 4% numa-vmstat.node1.numa_hit
1346117 ± 2% +18.2% 1591200 ± 8% numa-vmstat.node1.numa_local
188658 ± 5% -44.0% 105714 ± 64% numa-vmstat.node1.numa_other
329075 +1.6% 334289 proc-vmstat.nr_active_anon
311360 +1.7% 316562 proc-vmstat.nr_anon_pages
569.00 +1.6% 578.25 proc-vmstat.nr_anon_transparent_hugepages
11307 +4.5% 11810 proc-vmstat.nr_kernel_stack
1326 +3.9% 1378 proc-vmstat.nr_page_table_pages
329076 +1.6% 334288 proc-vmstat.nr_zone_active_anon
5011 ± 60% -92.4% 379.25 ± 31% proc-vmstat.numa_hint_faults_local
3840470 +21.3% 4657420 proc-vmstat.numa_hit
3796678 +21.5% 4613690 proc-vmstat.numa_local
19719 ± 17% -50.1% 9834 ± 51% proc-vmstat.numa_pages_migrated
7.862e+08 +26.8% 9.965e+08 proc-vmstat.pgalloc_normal
2444034 +16.4% 2844032 proc-vmstat.pgfault
7.861e+08 +26.7% 9.963e+08 proc-vmstat.pgfree
19719 ± 17% -50.1% 9834 ± 51% proc-vmstat.pgmigrate_success
1530844 +26.8% 1940789 proc-vmstat.thp_fault_alloc
0.03 ± 4% +79.2% 0.05 ± 40% perf-sched.sch_delay.avg.ms.__sched_text_start.__sched_text_start.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.02 ± 16% +55.6% 0.03 ± 44% perf-sched.sch_delay.avg.ms.__sched_text_start.__sched_text_start.do_wait.kernel_wait4.__do_sys_wait4
0.01 ± 44% +97.9% 0.02 ± 32% perf-sched.sch_delay.avg.ms.__sched_text_start.__sched_text_start.preempt_schedule_common._cond_resched.remove_vma
0.12 ± 12% -26.7% 0.09 ± 17% perf-sched.sch_delay.max.ms.__sched_text_start.__sched_text_start.devkmsg_read.vfs_read.ksys_read
0.07 ± 11% -33.2% 0.04 ± 19% perf-sched.sch_delay.max.ms.__sched_text_start.__sched_text_start.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep
7.29 ± 67% -98.4% 0.12 ± 19% perf-sched.sch_delay.max.ms.__sched_text_start.__sched_text_start.schedule_timeout.rcu_gp_kthread.kthread
46.54 ± 3% -16.5% 38.84 ± 6% perf-sched.total_wait_and_delay.average.ms
25143 +21.8% 30624 ± 5% perf-sched.total_wait_and_delay.count.ms
46.52 ± 3% -16.5% 38.82 ± 6% perf-sched.total_wait_time.average.ms
2.88 ± 6% -13.4% 2.49 ± 11% perf-sched.wait_and_delay.avg.ms.__sched_text_start.__sched_text_start.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
299.00 ± 15% -20.1% 239.00 ± 8% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.preempt_schedule_common._cond_resched.clear_huge_page
17537 +26.9% 22259 ± 3% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.rwsem_down_read_slowpath.do_user_addr_fault.page_fault
916.25 +39.9% 1282 ± 2% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
657.75 ± 3% +32.3% 870.00 ± 2% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
2.85 ± 6% -13.4% 2.47 ± 11% perf-sched.wait_time.avg.ms.__sched_text_start.__sched_text_start.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff
0.01 ± 95% +124.5% 0.03 ± 32% perf-sched.wait_time.max.ms.__sched_text_start.__sched_text_start.io_schedule.__lock_page_killable.filemap_fault
0.70 ± 3% +0.2 0.92 ± 17% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter_state.cpuidle_enter.do_idle
0.74 ± 3% +0.2 0.98 ± 18% perf-profile.calltrace.cycles-pp.__irqentry_text_start.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
0.07 ± 10% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.__unwind_start
0.07 ± 5% +0.0 0.09 ± 9% perf-profile.children.cycles-pp.dequeue_task_fair
0.18 ± 8% +0.0 0.21 ± 6% perf-profile.children.cycles-pp.perf_callchain_kernel
0.04 ± 58% +0.0 0.07 ± 7% perf-profile.children.cycles-pp.try_to_wake_up
0.06 ± 11% +0.0 0.09 ± 17% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.06 +0.0 0.09 ± 7% perf-profile.children.cycles-pp.schedule_idle
0.09 ± 9% +0.0 0.12 ± 17% perf-profile.children.cycles-pp.irq_enter
0.07 ± 10% +0.0 0.10 ± 14% perf-profile.children.cycles-pp.perf_trace_sched_switch
0.10 ± 4% +0.0 0.14 ± 18% perf-profile.children.cycles-pp.serial8250_console_putchar
0.11 +0.0 0.15 ± 11% perf-profile.children.cycles-pp.irq_work_run_list
0.10 ± 4% +0.0 0.14 ± 14% perf-profile.children.cycles-pp.uart_console_write
0.11 ± 4% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.irq_work_run
0.11 ± 4% +0.0 0.15 ± 11% perf-profile.children.cycles-pp.serial8250_console_write
0.08 ± 10% +0.0 0.12 ± 19% perf-profile.children.cycles-pp.io_serial_in
0.10 ± 4% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.irq_work_interrupt
0.10 ± 4% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.smp_irq_work_interrupt
0.10 ± 4% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.printk
0.10 +0.0 0.14 ± 15% perf-profile.children.cycles-pp.wait_for_xmitr
0.11 ± 4% +0.0 0.15 ± 14% perf-profile.children.cycles-pp.vprintk_emit
0.11 ± 4% +0.0 0.15 ± 14% perf-profile.children.cycles-pp.console_unlock
0.14 ± 10% +0.0 0.18 ± 8% perf-profile.children.cycles-pp.schedule
0.03 ±100% +0.0 0.07 ± 17% perf-profile.children.cycles-pp.tick_irq_enter
0.25 ± 10% +0.0 0.29 ± 6% perf-profile.children.cycles-pp.get_perf_callchain
0.25 ± 10% +0.0 0.30 ± 5% perf-profile.children.cycles-pp.perf_callchain
0.01 ±173% +0.1 0.06 ± 17% perf-profile.children.cycles-pp.__next_timer_interrupt
0.28 ± 7% +0.1 0.33 ± 5% perf-profile.children.cycles-pp.perf_prepare_sample
0.20 ± 8% +0.1 0.28 ± 6% perf-profile.children.cycles-pp.__sched_text_start
0.37 ± 11% +0.1 0.44 ± 5% perf-profile.children.cycles-pp.perf_swevent_overflow
0.36 ± 10% +0.1 0.44 ± 4% perf-profile.children.cycles-pp.__perf_event_overflow
0.39 ± 11% +0.1 0.48 ± 5% perf-profile.children.cycles-pp.perf_tp_event
0.26 ± 17% +0.1 0.36 ± 30% perf-profile.children.cycles-pp.menu_select
0.78 ± 7% +0.1 0.90 ± 8% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.94 ± 6% +0.2 1.10 ± 11% perf-profile.children.cycles-pp.hrtimer_interrupt
1.30 ± 4% +0.3 1.56 ± 11% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
0.07 ± 11% +0.1 0.13 ± 36% perf-profile.self.cycles-pp.menu_select
394290 ± 2% +19.8% 472478 ± 3% interrupts.CAL:Function_call_interrupts
9087 ± 6% +30.3% 11840 ± 16% interrupts.CPU0.CAL:Function_call_interrupts
1570 ± 30% -53.2% 735.50 ± 20% interrupts.CPU0.RES:Rescheduling_interrupts
9202 ± 8% +29.7% 11938 ± 17% interrupts.CPU0.TLB:TLB_shootdowns
6488 ± 9% -41.9% 3766 ± 28% interrupts.CPU1.NMI:Non-maskable_interrupts
6488 ± 9% -41.9% 3766 ± 28% interrupts.CPU1.PMI:Performance_monitoring_interrupts
12965 ± 6% +34.2% 17405 ± 11% interrupts.CPU10.CAL:Function_call_interrupts
13003 ± 5% +34.0% 17426 ± 12% interrupts.CPU10.TLB:TLB_shootdowns
12766 ± 6% +28.3% 16376 ± 10% interrupts.CPU11.CAL:Function_call_interrupts
12665 ± 7% +29.5% 16402 ± 11% interrupts.CPU11.TLB:TLB_shootdowns
4156 ± 22% -39.6% 2509 ± 42% interrupts.CPU12.NMI:Non-maskable_interrupts
4156 ± 22% -39.6% 2509 ± 42% interrupts.CPU12.PMI:Performance_monitoring_interrupts
551.75 ± 22% +80.5% 996.00 ± 30% interrupts.CPU13.RES:Rescheduling_interrupts
12383 ± 3% +11.9% 13851 ± 5% interrupts.CPU14.TLB:TLB_shootdowns
15287 ± 4% +15.6% 17677 ± 11% interrupts.CPU24.CAL:Function_call_interrupts
15250 ± 5% +16.3% 17736 ± 11% interrupts.CPU24.TLB:TLB_shootdowns
325.00 ± 10% +36.5% 443.75 ± 19% interrupts.CPU25.RES:Rescheduling_interrupts
10439 ± 6% +34.0% 13993 ± 12% interrupts.CPU26.CAL:Function_call_interrupts
10126 ± 6% +36.3% 13799 ± 12% interrupts.CPU26.TLB:TLB_shootdowns
330.50 ± 5% +32.5% 437.75 ± 23% interrupts.CPU27.RES:Rescheduling_interrupts
10580 ± 4% +21.2% 12825 ± 14% interrupts.CPU30.CAL:Function_call_interrupts
10471 ± 5% +21.0% 12668 ± 15% interrupts.CPU30.TLB:TLB_shootdowns
4329 ± 11% -30.0% 3030 ± 24% interrupts.CPU36.NMI:Non-maskable_interrupts
4329 ± 11% -30.0% 3030 ± 24% interrupts.CPU36.PMI:Performance_monitoring_interrupts
10463 ± 6% +25.5% 13131 ± 11% interrupts.CPU38.CAL:Function_call_interrupts
4078 ± 18% -36.1% 2604 ± 19% interrupts.CPU38.NMI:Non-maskable_interrupts
4078 ± 18% -36.1% 2604 ± 19% interrupts.CPU38.PMI:Performance_monitoring_interrupts
461.50 ± 12% +42.3% 656.75 ± 9% interrupts.CPU38.RES:Rescheduling_interrupts
11084 ± 6% +31.7% 14592 ± 12% interrupts.CPU38.TLB:TLB_shootdowns
14158 ± 4% +27.9% 18104 ± 13% interrupts.CPU4.CAL:Function_call_interrupts
14231 ± 4% +27.8% 18184 ± 14% interrupts.CPU4.TLB:TLB_shootdowns
498.25 ± 41% -45.3% 272.75 ± 31% interrupts.CPU41.NMI:Non-maskable_interrupts
498.25 ± 41% -45.3% 272.75 ± 31% interrupts.CPU41.PMI:Performance_monitoring_interrupts
185.50 ± 38% +178.2% 516.00 ± 31% interrupts.CPU42.RES:Rescheduling_interrupts
13001 ± 6% +25.6% 16329 ± 8% interrupts.CPU5.CAL:Function_call_interrupts
12881 ± 7% +26.8% 16329 ± 9% interrupts.CPU5.TLB:TLB_shootdowns
13824 ± 10% +22.2% 16893 ± 6% interrupts.CPU7.CAL:Function_call_interrupts
13794 ± 11% +23.6% 17043 ± 6% interrupts.CPU7.TLB:TLB_shootdowns
13213 ± 6% +29.8% 17146 ± 6% interrupts.CPU8.CAL:Function_call_interrupts
4926 ± 30% -42.7% 2822 ± 19% interrupts.CPU8.NMI:Non-maskable_interrupts
4926 ± 30% -42.7% 2822 ± 19% interrupts.CPU8.PMI:Performance_monitoring_interrupts
13215 ± 7% +32.2% 17471 ± 5% interrupts.CPU8.TLB:TLB_shootdowns
12624 ± 7% +30.6% 16489 ± 19% interrupts.CPU9.CAL:Function_call_interrupts
12439 ± 7% +32.3% 16462 ± 20% interrupts.CPU9.TLB:TLB_shootdowns
153249 ± 4% -14.8% 130602 ± 7% interrupts.NMI:Non-maskable_interrupts
153249 ± 4% -14.8% 130602 ± 7% interrupts.PMI:Performance_monitoring_interrupts
386515 ± 2% +21.7% 470438 ± 3% interrupts.TLB:TLB_shootdowns
188.73 +12.3% 211.95 perf-stat.i.MPKI
2.738e+08 +10.4% 3.023e+08 perf-stat.i.branch-instructions
1.707e+08 +26.5% 2.159e+08 perf-stat.i.cache-misses
1.926e+08 +25.1% 2.41e+08 perf-stat.i.cache-references
5106 +21.5% 6202 perf-stat.i.context-switches
42.38 -13.1% 36.81 perf-stat.i.cpi
4.347e+10 -3.3% 4.204e+10 perf-stat.i.cpu-cycles
256.07 -23.3% 196.33 perf-stat.i.cycles-between-cache-misses
3.19e+08 ± 2% +9.8% 3.503e+08 perf-stat.i.dTLB-loads
9.048e+08 +22.5% 1.108e+09 perf-stat.i.dTLB-stores
1.231e+09 +9.6% 1.349e+09 perf-stat.i.instructions
1255 +7.0% 1342 ± 2% perf-stat.i.instructions-per-iTLB-miss
0.03 +12.7% 0.03 ± 2% perf-stat.i.ipc
0.91 -3.3% 0.88 perf-stat.i.metric.GHz
0.44 +17.6% 0.52 perf-stat.i.metric.K/sec
38.15 +19.4% 45.56 perf-stat.i.metric.M/sec
7976 +16.8% 9318 perf-stat.i.minor-faults
5.36 ± 5% -0.7 4.62 ± 3% perf-stat.i.node-load-miss-rate%
242591 ± 5% +10.8% 268791 ± 4% perf-stat.i.node-load-misses
4250703 +30.2% 5534656 perf-stat.i.node-loads
1.90 ± 3% -0.2 1.71 perf-stat.i.node-store-miss-rate%
2488338 ± 2% +19.1% 2963150 perf-stat.i.node-store-misses
1.339e+08 +31.6% 1.762e+08 perf-stat.i.node-stores
7976 +16.8% 9318 perf-stat.i.page-faults
156.77 +14.0% 178.74 perf-stat.overall.MPKI
35.39 -11.9% 31.19 perf-stat.overall.cpi
254.66 -23.5% 194.76 perf-stat.overall.cycles-between-cache-misses
1259 +7.3% 1351 ± 2% perf-stat.overall.instructions-per-iTLB-miss
0.03 +13.5% 0.03 perf-stat.overall.ipc
5.39 ± 4% -0.8 4.63 ± 3% perf-stat.overall.node-load-miss-rate%
1.82 ± 2% -0.2 1.65 perf-stat.overall.node-store-miss-rate%
142317 -13.6% 122910 perf-stat.overall.path-length
2.725e+08 +10.5% 3.011e+08 perf-stat.ps.branch-instructions
1.701e+08 +26.4% 2.151e+08 perf-stat.ps.cache-misses
1.919e+08 +25.1% 2.401e+08 perf-stat.ps.cache-references
5089 +21.5% 6180 perf-stat.ps.context-switches
4.332e+10 -3.3% 4.19e+10 perf-stat.ps.cpu-cycles
3.175e+08 ± 2% +9.9% 3.489e+08 perf-stat.ps.dTLB-loads
9.016e+08 +22.5% 1.105e+09 perf-stat.ps.dTLB-stores
1.224e+09 +9.7% 1.344e+09 perf-stat.ps.instructions
7946 +16.8% 9284 perf-stat.ps.minor-faults
241532 ± 4% +10.9% 267854 ± 4% perf-stat.ps.node-load-misses
4235623 +30.2% 5515080 perf-stat.ps.node-loads
2480187 ± 2% +19.1% 2953222 perf-stat.ps.node-store-misses
1.334e+08 +31.6% 1.756e+08 perf-stat.ps.node-stores
7946 +16.8% 9284 perf-stat.ps.page-faults
3.708e+11 +9.5% 4.061e+11 perf-stat.total.instructions
will-it-scale.per_thread_ops
220000 +------------------------------------------------------------------+
| O O |
210000 |-O O O O O O |
| O O O OO |
| O O O |
200000 |-+ O O |
| |
190000 |-+ O O O |
| O O O O O |
180000 |-+ |
| |
| |
170000 |-+ |
|. .+.++. .+.+. +. .+.+. .+.++.+. .+.+.+. .+. .+. .++.+.+.|
160000 +------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Oliver Sang
View attachment "config-5.7.0-03884-ge44431498f5fbf4" of type "text/plain" (157793 bytes)
View attachment "job-script" of type "text/plain" (7863 bytes)
View attachment "job.yaml" of type "text/plain" (5466 bytes)
View attachment "reproduce" of type "text/plain" (342 bytes)
Powered by blists - more mailing lists