[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20200715023212.GC3874@shao2-debian>
Date: Wed, 15 Jul 2020 10:32:13 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Adrian Hunter <adrian.hunter@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: [perf] e17d43b93e: will-it-scale.per_process_ops 5.8% improvement
Greeting,
FYI, we noticed a 5.8% improvement of will-it-scale.per_process_ops due to commit:
commit: e17d43b93e544f5016c0251d2074c15568d5d963 ("perf: Add perf text poke event")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:
nr_task: 50%
mode: process
test: signal1
cpufreq_governor: performance
ucode: 0x5002f01
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap3/signal1/will-it-scale/0x5002f01
commit:
bb85429a9b ("perf/x86/intel/uncore: Add Comet Lake support")
e17d43b93e ("perf: Add perf text poke event")
bb85429a9bf2e7d3 e17d43b93e544f5016c0251d207
---------------- ---------------------------
%stddev %change %stddev
\ | \
63611 +5.8% 67274 will-it-scale.per_process_ops
6106768 +5.8% 6458348 will-it-scale.workload
4491 ± 7% +14935.2% 675232 ±171% cpuidle.POLL.usage
1.26 ± 2% +0.2 1.43 mpstat.cpu.all.usr%
6376 ± 67% +195.0% 18808 numa-numastat.node1.other_node
21360 ± 2% +9.6% 23410 ± 4% slabinfo.pid.active_objs
21361 ± 2% +9.6% 23410 ± 4% slabinfo.pid.num_objs
143377 ± 2% -3.2% 138781 proc-vmstat.nr_active_anon
143377 ± 2% -3.2% 138781 proc-vmstat.nr_zone_active_anon
40253 ± 2% -9.3% 36521 ± 6% proc-vmstat.pgactivate
5306 ± 13% +34.2% 7123 ± 31% softirqs.CPU1.RCU
21257 ± 80% -42.9% 12145 ±121% softirqs.CPU153.SCHED
29988 ± 37% -59.0% 12303 ±122% softirqs.CPU16.SCHED
359438 ± 10% -18.4% 293284 ± 14% numa-meminfo.node1.FilePages
39267 ± 22% -34.9% 25548 ± 24% numa-meminfo.node1.KReclaimable
39267 ± 22% -34.9% 25548 ± 24% numa-meminfo.node1.SReclaimable
87301 ± 11% -11.7% 77054 ± 7% numa-meminfo.node3.SUnreclaim
89869 ± 10% -18.4% 73315 ± 14% numa-vmstat.node1.nr_file_pages
9817 ± 22% -34.9% 6387 ± 24% numa-vmstat.node1.nr_slab_reclaimable
90821 ± 4% +14.1% 103653 numa-vmstat.node1.numa_other
21824 ± 11% -11.7% 19264 ± 7% numa-vmstat.node3.nr_slab_unreclaimable
-11370588 +33.2% -15141792 sched_debug.cfs_rq:/.spread0.min
45386 ± 12% +30.9% 59397 ± 4% sched_debug.cpu.sched_count.max
4603 ± 6% +17.9% 5426 ± 4% sched_debug.cpu.sched_count.stddev
23230 ± 12% +63.1% 37890 ± 38% sched_debug.cpu.sched_goidle.max
2434 ± 5% +35.7% 3302 ± 28% sched_debug.cpu.sched_goidle.stddev
6.033e+09 +5.3% 6.351e+09 perf-stat.i.branch-instructions
0.92 +0.0 0.96 perf-stat.i.branch-miss-rate%
55838649 +9.6% 61217237 perf-stat.i.branch-misses
9.96 -4.9% 9.47 perf-stat.i.cpi
9.222e+09 +5.3% 9.713e+09 perf-stat.i.dTLB-loads
24424 +10.5% 26979 ± 5% perf-stat.i.dTLB-store-misses
5.837e+09 +4.9% 6.122e+09 perf-stat.i.dTLB-stores
3.025e+10 +5.3% 3.185e+10 perf-stat.i.instructions
570.56 +5.8% 603.38 perf-stat.i.instructions-per-iTLB-miss
0.10 +5.2% 0.11 perf-stat.i.ipc
0.20 ± 3% +7.8% 0.21 ± 6% perf-stat.i.metric.K/sec
110.74 +5.2% 116.49 perf-stat.i.metric.M/sec
8414841 +3.5% 8707427 perf-stat.i.node-store-misses
4885 ± 24% +39.9% 6833 ± 10% perf-stat.i.node-stores
0.93 +0.0 0.96 perf-stat.overall.branch-miss-rate%
9.96 -5.0% 9.46 perf-stat.overall.cpi
570.27 +5.8% 603.10 perf-stat.overall.instructions-per-iTLB-miss
0.10 +5.2% 0.11 perf-stat.overall.ipc
6.013e+09 +5.2% 6.329e+09 perf-stat.ps.branch-instructions
55660104 +9.6% 61012673 perf-stat.ps.branch-misses
9.191e+09 +5.3% 9.679e+09 perf-stat.ps.dTLB-loads
24366 +10.4% 26910 ± 5% perf-stat.ps.dTLB-store-misses
5.817e+09 +4.9% 6.101e+09 perf-stat.ps.dTLB-stores
3.015e+10 +5.3% 3.174e+10 perf-stat.ps.instructions
8386131 +3.5% 8677133 perf-stat.ps.node-store-misses
4897 ± 24% +39.7% 6844 ± 10% perf-stat.ps.node-stores
9.097e+12 +5.1% 9.562e+12 perf-stat.total.instructions
5349 ± 15% -30.5% 3719 ± 13% interrupts.CPU112.CAL:Function_call_interrupts
7185 ± 20% -45.0% 3951 ± 29% interrupts.CPU146.NMI:Non-maskable_interrupts
7185 ± 20% -45.0% 3951 ± 29% interrupts.CPU146.PMI:Performance_monitoring_interrupts
5836 ± 9% -36.7% 3691 ± 17% interrupts.CPU172.CAL:Function_call_interrupts
6135 ± 5% -33.4% 4088 ± 21% interrupts.CPU176.CAL:Function_call_interrupts
4234 ± 17% +25.1% 5298 ± 11% interrupts.CPU177.CAL:Function_call_interrupts
5368 ± 11% +61.6% 8674 interrupts.CPU177.NMI:Non-maskable_interrupts
5368 ± 11% +61.6% 8674 interrupts.CPU177.PMI:Performance_monitoring_interrupts
4833 ± 12% -19.1% 3911 ± 9% interrupts.CPU180.CAL:Function_call_interrupts
7187 ± 20% -45.1% 3948 ± 29% interrupts.CPU182.NMI:Non-maskable_interrupts
7187 ± 20% -45.1% 3948 ± 29% interrupts.CPU182.PMI:Performance_monitoring_interrupts
7921 ± 16% -50.2% 3948 ± 29% interrupts.CPU188.NMI:Non-maskable_interrupts
7921 ± 16% -50.2% 3948 ± 29% interrupts.CPU188.PMI:Performance_monitoring_interrupts
4174 ± 17% +48.7% 6206 ± 19% interrupts.CPU191.CAL:Function_call_interrupts
6172 ± 5% -18.3% 5040 ± 8% interrupts.CPU2.CAL:Function_call_interrupts
7182 ± 19% -45.0% 3948 ± 15% interrupts.CPU20.NMI:Non-maskable_interrupts
7182 ± 19% -45.0% 3948 ± 15% interrupts.CPU20.PMI:Performance_monitoring_interrupts
6437 ± 19% -48.5% 3314 ± 46% interrupts.CPU3.NMI:Non-maskable_interrupts
6437 ± 19% -48.5% 3314 ± 46% interrupts.CPU3.PMI:Performance_monitoring_interrupts
91.50 ±107% -98.6% 1.25 ±131% interrupts.CPU50.RES:Rescheduling_interrupts
6040 ± 7% -29.7% 4244 ± 25% interrupts.CPU53.CAL:Function_call_interrupts
5800 ± 11% -35.2% 3757 ± 17% interrupts.CPU75.CAL:Function_call_interrupts
7925 ± 16% -59.3% 3227 ± 47% interrupts.CPU81.NMI:Non-maskable_interrupts
7925 ± 16% -59.3% 3227 ± 47% interrupts.CPU81.PMI:Performance_monitoring_interrupts
2.25 ± 96% +16577.8% 375.25 ±166% interrupts.CPU81.RES:Rescheduling_interrupts
6451 ± 19% -44.3% 3591 ± 44% interrupts.CPU90.NMI:Non-maskable_interrupts
6451 ± 19% -44.3% 3591 ± 44% interrupts.CPU90.PMI:Performance_monitoring_interrupts
5714 -42.7% 3273 ± 47% interrupts.CPU98.NMI:Non-maskable_interrupts
5714 -42.7% 3273 ± 47% interrupts.CPU98.PMI:Performance_monitoring_interrupts
0.56 +0.1 0.69 ± 8% perf-profile.calltrace.cycles-pp.__fpu__restore_sig.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.58 +0.1 0.72 ± 8% perf-profile.calltrace.cycles-pp.restore_sigcontext.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
0.69 +0.2 0.86 ± 8% perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
0.66 +0.2 0.83 ± 9% perf-profile.calltrace.cycles-pp.copy_fpstate_to_sigframe.__setup_rt_frame.do_signal.__prepare_exit_to_usermode.do_syscall_64
0.80 +0.2 1.01 ± 9% perf-profile.calltrace.cycles-pp.__setup_rt_frame.do_signal.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.38 ± 57% +0.2 0.62 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
0.38 ± 57% +0.2 0.62 ± 8% perf-profile.calltrace.cycles-pp.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
0.38 ± 57% +0.2 0.63 ± 9% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.handler
0.39 ± 57% +0.3 0.65 ± 9% perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
0.13 ±173% +0.5 0.61 ± 8% perf-profile.calltrace.cycles-pp.do_signal.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
0.00 +0.6 0.61 ± 9% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.raise
2.08 +0.7 2.79 ± 8% perf-profile.calltrace.cycles-pp.handler
2.57 ± 2% +1.3 3.88 ± 9% perf-profile.calltrace.cycles-pp.__sigqueue_alloc.__send_signal.do_send_sig_info.do_send_specific.do_tkill
2.56 ± 3% +1.3 3.88 ± 9% perf-profile.calltrace.cycles-pp.__sigqueue_free.__dequeue_signal.dequeue_signal.get_signal.do_signal
2.62 ± 3% +1.3 3.95 ± 9% perf-profile.calltrace.cycles-pp.__dequeue_signal.dequeue_signal.get_signal.do_signal.__prepare_exit_to_usermode
2.62 ± 2% +1.3 3.96 ± 9% perf-profile.calltrace.cycles-pp.__send_signal.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill
2.67 ± 3% +1.3 4.01 ± 9% perf-profile.calltrace.cycles-pp.dequeue_signal.get_signal.do_signal.__prepare_exit_to_usermode.do_syscall_64
2.78 ± 2% +1.4 4.14 ± 9% perf-profile.calltrace.cycles-pp.get_signal.do_signal.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.76 ± 2% +1.4 4.13 ± 9% perf-profile.calltrace.cycles-pp.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64
3.60 ± 2% +1.6 5.17 ± 9% perf-profile.calltrace.cycles-pp.do_signal.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
3.67 +1.6 5.25 ± 9% perf-profile.calltrace.cycles-pp.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
0.06 ± 6% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.__clear_user
0.11 +0.0 0.13 ± 6% perf-profile.children.cycles-pp.restore_altstack
0.08 +0.0 0.11 ± 8% perf-profile.children.cycles-pp._copy_to_user
0.09 ± 4% +0.0 0.13 ± 8% perf-profile.children.cycles-pp.___might_sleep
0.14 ± 3% +0.0 0.18 ± 9% perf-profile.children.cycles-pp.__task_pid_nr_ns
0.22 ± 3% +0.1 0.27 ± 8% perf-profile.children.cycles-pp.copy_user_generic_unrolled
0.17 +0.1 0.22 ± 11% perf-profile.children.cycles-pp.__might_fault
0.22 +0.1 0.28 ± 7% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.find_task_by_vpid
0.17 ± 5% +0.1 0.23 ± 14% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.17 ± 5% +0.1 0.23 ± 14% perf-profile.children.cycles-pp.hrtimer_interrupt
0.00 +0.1 0.06 ± 13% perf-profile.children.cycles-pp.__lock_text_start
0.21 ± 5% +0.1 0.29 ± 13% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.26 ± 5% +0.1 0.33 ± 12% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.36 +0.1 0.45 ± 9% perf-profile.children.cycles-pp.__set_current_blocked
0.38 ± 2% +0.1 0.47 ± 8% perf-profile.children.cycles-pp.fpu__clear
0.40 +0.1 0.49 ± 8% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.39 +0.1 0.49 ± 8% perf-profile.children.cycles-pp._copy_from_user
0.56 +0.1 0.70 ± 8% perf-profile.children.cycles-pp.__fpu__restore_sig
0.54 +0.1 0.67 ± 9% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.51 ± 2% +0.1 0.66 ± 9% perf-profile.children.cycles-pp.__x64_sys_rt_sigprocmask
0.73 +0.2 0.90 ± 8% perf-profile.children.cycles-pp.restore_sigcontext
0.66 +0.2 0.83 ± 8% perf-profile.children.cycles-pp.copy_fpstate_to_sigframe
0.81 +0.2 1.01 ± 9% perf-profile.children.cycles-pp.__setup_rt_frame
1.08 +0.3 1.34 ± 8% perf-profile.children.cycles-pp.__x64_sys_rt_sigreturn
0.77 +0.3 1.07 ± 7% perf-profile.children.cycles-pp.native_irq_return_iret
1.36 +0.4 1.80 ± 8% perf-profile.children.cycles-pp.handler
2.57 ± 2% +1.3 3.88 ± 9% perf-profile.children.cycles-pp.__sigqueue_alloc
2.56 ± 3% +1.3 3.88 ± 9% perf-profile.children.cycles-pp.__sigqueue_free
2.62 ± 3% +1.3 3.95 ± 9% perf-profile.children.cycles-pp.__dequeue_signal
2.63 ± 2% +1.3 3.96 ± 9% perf-profile.children.cycles-pp.__send_signal
2.67 ± 3% +1.3 4.01 ± 9% perf-profile.children.cycles-pp.dequeue_signal
2.79 ± 2% +1.3 4.14 ± 9% perf-profile.children.cycles-pp.get_signal
2.76 ± 2% +1.4 4.13 ± 9% perf-profile.children.cycles-pp.do_send_sig_info
4.10 +1.7 5.78 ± 9% perf-profile.children.cycles-pp.do_signal
4.19 +1.7 5.89 ± 9% perf-profile.children.cycles-pp.__prepare_exit_to_usermode
0.06 +0.0 0.08 ± 6% perf-profile.self.cycles-pp.__set_current_blocked
0.06 +0.0 0.08 ± 8% perf-profile.self.cycles-pp.__clear_user
0.09 ± 4% +0.0 0.12 ± 10% perf-profile.self.cycles-pp.___might_sleep
0.14 ± 3% +0.0 0.18 ± 9% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.13 ± 3% +0.0 0.18 ± 9% perf-profile.self.cycles-pp.__task_pid_nr_ns
0.21 ± 3% +0.0 0.26 ± 9% perf-profile.self.cycles-pp.copy_user_generic_unrolled
0.23 +0.1 0.28 ± 8% perf-profile.self.cycles-pp.do_syscall_64
0.00 +0.1 0.05 ± 9% perf-profile.self.cycles-pp._copy_from_user
0.22 +0.1 0.27 ± 7% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.24 ± 2% +0.1 0.30 ± 8% perf-profile.self.cycles-pp.fpu__clear
0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.__syscall_return_slowpath
0.29 ± 2% +0.1 0.36 ± 9% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.40 +0.1 0.49 ± 8% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.37 +0.1 0.47 ± 9% perf-profile.self.cycles-pp.raise
0.43 +0.1 0.54 ± 8% perf-profile.self.cycles-pp.__fpu__restore_sig
0.50 +0.1 0.63 ± 8% perf-profile.self.cycles-pp.copy_fpstate_to_sigframe
0.77 ± 2% +0.3 1.07 ± 8% perf-profile.self.cycles-pp.native_irq_return_iret
2.54 ± 2% +1.3 3.84 ± 9% perf-profile.self.cycles-pp.__sigqueue_alloc
2.54 ± 3% +1.3 3.87 ± 9% perf-profile.self.cycles-pp.__sigqueue_free
21.43 +4.7 26.13 ± 9% perf-profile.self.cycles-pp.apparmor_task_kill
will-it-scale.per_process_ops
74000 +-------------------------------------------------------------------+
| O O O O O O |
72000 |-+ O O O O O O O O |
70000 |-+O O O O |
| |
68000 |-+ |
| O O O |
66000 |-+ |
| +.. |
64000 |-+ .. +..+..+ |
62000 |-+ . |
| +..+..+..+..+..+..+..+ |
60000 |-+ .. |
| .+..+..+..+..+...+..+..+ |
58000 +-------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-5.8.0-rc1-00002-ge17d43b93e544" of type "text/plain" (158289 bytes)
View attachment "job-script" of type "text/plain" (7236 bytes)
View attachment "job.yaml" of type "text/plain" (4870 bytes)
View attachment "reproduce" of type "text/plain" (339 bytes)
Powered by blists - more mailing lists