[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210406072708.GD7352@xsang-OptiPlex-9020>
Date: Tue, 6 Apr 2021 15:27:08 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...e.de>,
Miroslav Benes <mbenes@...e.cz>,
LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
feng.tang@...el.com, zhengjun.xing@...el.com
Subject: [objtool/x86] 9bc0bb5072: will-it-scale.per_process_ops 5.6%
improvement
Greeting,
FYI, we noticed a 5.6% improvement of will-it-scale.per_process_ops due to commit:
commit: 9bc0bb50727c8ac69fbb33fb937431cf3518ff37 ("objtool/x86: Rewrite retpoline thunk calls")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core
in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:
nr_task: 16
mode: process
test: eventfd1
cpufreq_governor: performance
ucode: 0x5003006
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/eventfd1/will-it-scale/0x5003006
commit:
50e7b4a1a1 ("objtool: Skip magical retpoline .altinstr_replacement")
9bc0bb5072 ("objtool/x86: Rewrite retpoline thunk calls")
50e7b4a1a1b264fc 9bc0bb50727c8ac69fbb33fb937
---------------- ---------------------------
%stddev %change %stddev
\ | \
46843229 +5.6% 49479323 will-it-scale.16.processes
2927701 +5.6% 3092457 will-it-scale.per_process_ops
46843229 +5.6% 49479323 will-it-scale.workload
8251 ± 8% -19.6% 6635 ± 11% numa-vmstat.node0.nr_slab_reclaimable
33007 ± 8% -19.6% 26543 ± 11% numa-meminfo.node0.KReclaimable
33007 ± 8% -19.6% 26543 ± 11% numa-meminfo.node0.SReclaimable
1172 ± 12% -68.4% 370.67 ±141% perf-sched.wait_and_delay.avg.ms.futex_wait_queue_me.futex_wait.do_futex.__x64_sys_futex
1172 ± 12% -68.4% 370.65 ±141% perf-sched.wait_time.avg.ms.futex_wait_queue_me.futex_wait.do_futex.__x64_sys_futex
112.67 ± 20% -32.7% 75.83 ± 19% interrupts.CPU115.NMI:Non-maskable_interrupts
112.67 ± 20% -32.7% 75.83 ± 19% interrupts.CPU115.PMI:Performance_monitoring_interrupts
154.00 ± 43% -41.8% 89.67 ± 40% interrupts.CPU135.NMI:Non-maskable_interrupts
154.00 ± 43% -41.8% 89.67 ± 40% interrupts.CPU135.PMI:Performance_monitoring_interrupts
128.50 ± 16% -39.7% 77.50 ± 33% interrupts.CPU151.NMI:Non-maskable_interrupts
128.50 ± 16% -39.7% 77.50 ± 33% interrupts.CPU151.PMI:Performance_monitoring_interrupts
126.50 ± 19% -39.1% 77.00 ± 34% interrupts.CPU152.NMI:Non-maskable_interrupts
126.50 ± 19% -39.1% 77.00 ± 34% interrupts.CPU152.PMI:Performance_monitoring_interrupts
150.67 ± 49% -52.7% 71.33 ± 33% interrupts.CPU153.NMI:Non-maskable_interrupts
150.67 ± 49% -52.7% 71.33 ± 33% interrupts.CPU153.PMI:Performance_monitoring_interrupts
134.67 ± 30% -45.5% 73.33 ± 33% interrupts.CPU154.NMI:Non-maskable_interrupts
134.67 ± 30% -45.5% 73.33 ± 33% interrupts.CPU154.PMI:Performance_monitoring_interrupts
229.00 ± 82% -64.9% 80.33 ± 38% interrupts.CPU57.NMI:Non-maskable_interrupts
229.00 ± 82% -64.9% 80.33 ± 38% interrupts.CPU57.PMI:Performance_monitoring_interrupts
9305 ± 16% +30.4% 12133 ± 20% softirqs.CPU116.RCU
9674 ± 8% +17.7% 11391 ± 11% softirqs.CPU121.RCU
10950 ± 8% +13.3% 12402 ± 7% softirqs.CPU160.RCU
11054 ± 8% +14.6% 12663 ± 5% softirqs.CPU161.RCU
10764 ± 6% +16.6% 12548 ± 6% softirqs.CPU163.RCU
11073 ± 8% +20.4% 13337 ± 4% softirqs.CPU164.RCU
10840 ± 7% +18.1% 12797 ± 6% softirqs.CPU165.RCU
10935 ± 9% +19.5% 13066 ± 7% softirqs.CPU166.RCU
10791 ± 8% +17.0% 12629 ± 8% softirqs.CPU168.RCU
10152 ± 6% +17.1% 11892 ± 5% softirqs.CPU171.RCU
10644 ± 6% +13.0% 12032 ± 5% softirqs.CPU172.RCU
14639 ± 11% +20.5% 17644 ± 10% softirqs.CPU3.RCU
11177 ± 8% +13.4% 12671 ± 7% softirqs.CPU64.RCU
11039 ± 6% +15.3% 12730 ± 6% softirqs.CPU67.RCU
11218 ± 9% +17.9% 13225 ± 5% softirqs.CPU68.RCU
15014 ± 11% +17.8% 17688 ± 6% softirqs.CPU7.RCU
11300 ± 9% +17.4% 13267 ± 7% softirqs.CPU70.RCU
11094 ± 6% +18.1% 13099 ± 7% softirqs.CPU71.RCU
10930 ± 8% +15.5% 12620 ± 5% softirqs.CPU72.RCU
10800 ± 7% +15.8% 12509 ± 8% softirqs.CPU75.RCU
10822 ± 8% +14.7% 12412 ± 6% softirqs.CPU76.RCU
24155 ± 13% +26.9% 30649 ± 14% softirqs.CPU99.SCHED
1.633e+10 +3.7% 1.694e+10 perf-stat.i.branch-instructions
1.18 ± 9% -0.6 0.59 ± 15% perf-stat.i.branch-miss-rate%
1.881e+08 ± 5% -47.4% 98885807 ± 15% perf-stat.i.branch-misses
4905715 ± 12% -56.2% 2149255 ± 70% perf-stat.i.cache-misses
0.72 ± 5% -11.5% 0.64 ± 2% perf-stat.i.cpi
11933 ± 13% +264.7% 43517 ± 54% perf-stat.i.cycles-between-cache-misses
2.352e+10 +5.6% 2.484e+10 perf-stat.i.dTLB-loads
1.574e+10 +5.7% 1.664e+10 perf-stat.i.dTLB-stores
1.748e+08 -47.8% 91212835 ± 6% perf-stat.i.iTLB-load-misses
8.088e+10 +5.6% 8.541e+10 perf-stat.i.instructions
464.80 +104.0% 948.14 ± 6% perf-stat.i.instructions-per-iTLB-miss
1.42 +10.6% 1.57 ± 2% perf-stat.i.ipc
289.77 +5.1% 304.52 perf-stat.i.metric.M/sec
54032 ± 36% -49.7% 27155 ± 26% perf-stat.i.node-loads
1.15 ± 5% -0.6 0.58 ± 16% perf-stat.overall.branch-miss-rate%
0.70 -9.3% 0.64 ± 2% perf-stat.overall.cpi
11810 ± 13% +229.4% 38908 ± 51% perf-stat.overall.cycles-between-cache-misses
462.63 +103.3% 940.67 ± 6% perf-stat.overall.instructions-per-iTLB-miss
1.42 +10.4% 1.57 ± 2% perf-stat.overall.ipc
1.627e+10 +3.7% 1.688e+10 perf-stat.ps.branch-instructions
1.875e+08 ± 5% -47.4% 98557001 ± 15% perf-stat.ps.branch-misses
4889627 ± 12% -56.2% 2142836 ± 70% perf-stat.ps.cache-misses
2.344e+10 +5.6% 2.476e+10 perf-stat.ps.dTLB-loads
1.569e+10 +5.7% 1.659e+10 perf-stat.ps.dTLB-stores
1.742e+08 -47.8% 90889010 ± 6% perf-stat.ps.iTLB-load-misses
8.061e+10 +5.6% 8.512e+10 perf-stat.ps.instructions
53915 ± 36% -49.6% 27175 ± 26% perf-stat.ps.node-loads
2.442e+13 +5.3% 2.571e+13 perf-stat.total.instructions
14.71 ± 7% -2.0 12.67 ± 8% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
3.80 ± 26% -1.5 2.29 ± 12% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
8.34 ± 7% -1.2 7.13 ± 8% perf-profile.calltrace.cycles-pp.eventfd_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.32 ± 31% -1.0 0.30 ±103% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
4.89 ± 7% -0.9 3.98 ± 9% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.72 ± 6% -0.8 2.94 ± 7% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.70 ± 7% -0.6 2.15 ± 10% perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_read.ksys_read.do_syscall_64
2.85 ± 6% -0.5 2.36 ± 7% perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_write.ksys_write.do_syscall_64
0.72 ± 8% -0.4 0.28 ±100% perf-profile.calltrace.cycles-pp.___might_sleep.__might_fault._copy_from_user.eventfd_write.vfs_write
1.23 ± 8% -0.3 0.97 ± 10% perf-profile.calltrace.cycles-pp.__might_fault._copy_from_user.eventfd_write.vfs_write.ksys_write
14.85 ± 7% -2.0 12.81 ± 8% perf-profile.children.cycles-pp.vfs_write
8.61 ± 6% -1.7 6.93 ± 8% perf-profile.children.cycles-pp.security_file_permission
8.45 ± 7% -1.2 7.24 ± 8% perf-profile.children.cycles-pp.eventfd_write
5.70 ± 7% -1.1 4.64 ± 9% perf-profile.children.cycles-pp.common_file_perm
1.33 ± 31% -0.8 0.48 ± 28% perf-profile.children.cycles-pp.menu_select
3.46 ± 15% -0.8 2.68 ± 10% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.71 ± 7% -0.4 0.27 ± 10% perf-profile.children.cycles-pp.apparmor_file_permission
2.46 ± 7% -0.3 2.11 ± 9% perf-profile.children.cycles-pp.__might_fault
1.33 ± 7% -0.2 1.13 ± 9% perf-profile.children.cycles-pp.___might_sleep
0.38 ± 8% +0.1 0.48 ± 9% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
1.57 ± 55% -1.3 0.30 ± 21% perf-profile.self.cycles-pp.cpuidle_enter_state
4.39 ± 7% -1.1 3.27 ± 8% perf-profile.self.cycles-pp.common_file_perm
2.15 ± 6% -0.8 1.32 ± 11% perf-profile.self.cycles-pp.eventfd_write
0.98 ± 42% -0.8 0.20 ± 44% perf-profile.self.cycles-pp.menu_select
2.25 ± 7% -0.6 1.61 ± 8% perf-profile.self.cycles-pp.eventfd_read
0.57 ± 7% -0.3 0.27 ± 10% perf-profile.self.cycles-pp.apparmor_file_permission
1.32 ± 7% -0.2 1.12 ± 9% perf-profile.self.cycles-pp.___might_sleep
0.43 ± 7% -0.1 0.35 ± 9% perf-profile.self.cycles-pp.__might_fault
0.11 ± 12% -0.0 0.08 ± 16% perf-profile.self.cycles-pp.read_tsc
0.07 ± 5% +0.1 0.13 ± 11% perf-profile.self.cycles-pp.__x64_sys_write
0.07 ± 12% +0.1 0.14 ± 11% perf-profile.self.cycles-pp.__x64_sys_read
0.26 ± 10% +0.1 0.37 ± 8% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
will-it-scale.per_process_ops
3.12e+06 +----------------------------------------------------------------+
3.1e+06 |-+ |
| O O O O |
3.08e+06 |-+ O |
3.06e+06 |-+ |
| |
3.04e+06 |-+ O O OO O |
3.02e+06 |-O O O O |
3e+06 |-+ O |
| |
2.98e+06 |-+ O OO |
2.96e+06 |-+ |
| +.++.+.+.++.+.+.++.+.+.++.+.+.+ |
2.94e+06 |.+ + +. .+. +. .+ .+.+ .+.|
2.92e+06 +----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation
Thanks,
Oliver Sang
View attachment "config-5.12.0-rc5-00072-g9bc0bb50727c" of type "text/plain" (172949 bytes)
View attachment "job-script" of type "text/plain" (7765 bytes)
View attachment "job.yaml" of type "text/plain" (5105 bytes)
View attachment "reproduce" of type "text/plain" (340 bytes)
Powered by blists - more mailing lists