[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202210272158.a9585179-oliver.sang@intel.com>
Date: Fri, 28 Oct 2022 15:07:36 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
<linux-kernel@...r.kernel.org>, <x86@...nel.org>,
<ying.huang@...el.com>, <feng.tang@...el.com>,
<zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>
Subject: [tip:x86/core] [x86/retbleed] 80e4c1cd42:
will-it-scale.per_thread_ops -5.4% regression
Hi Thomas,
though we call it a 'regression' in title by following parent-vs-commit rule
in our reporting, we understand from commit message this is actually a big
improvement if comparing to 'microcode mitigation' which could cause up to
30% performance drop.
we still report it out FYI about possible performance impact to some micro
benchmark.
Greeting,
FYI, we noticed a -5.4% regression of will-it-scale.per_thread_ops due to commit:
commit: 80e4c1cd42fff110bfdae8fce7ac4f22465f9664 ("x86/retbleed: Add X86_FEATURE_CALL_DEPTH")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core
in testcase: will-it-scale
on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
with following parameters:
nr_task: 100%
mode: thread
test: futex3
cpufreq_governor: performance
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202210272158.a9585179-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-csl-2ap4/futex3/will-it-scale
commit:
bea75b3389 ("x86/Kconfig: Introduce function padding")
80e4c1cd42 ("x86/retbleed: Add X86_FEATURE_CALL_DEPTH")
bea75b33895f7f87 80e4c1cd42fff110bfdae8fce7a
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.335e+09 -5.4% 1.263e+09 will-it-scale.192.threads
6951370 -5.4% 6578078 will-it-scale.per_thread_ops
1.335e+09 -5.4% 1.263e+09 will-it-scale.workload
33.29 -3.0% 32.30 boot-time.dhcp
0.97 ± 2% +0.1 1.07 ± 2% mpstat.cpu.all.irq%
83145 ±146% -94.2% 4796 ± 6% turbostat.C1
878.33 ± 4% +11.7% 981.00 ± 11% proc-vmstat.direct_map_level2_splits
77333 +2.2% 79018 proc-vmstat.nr_slab_unreclaimable
47455 ± 12% -23.2% 36450 ± 17% proc-vmstat.numa_hint_faults
43003 ± 32% -37.6% 26846 ± 37% proc-vmstat.numa_pages_migrated
43003 ± 32% -37.6% 26846 ± 37% proc-vmstat.pgmigrate_success
198321 ± 12% +21.9% 241714 ± 14% numa-meminfo.node1.AnonPages
200294 ± 12% +21.0% 242442 ± 14% numa-meminfo.node1.Inactive
200294 ± 12% +21.0% 242442 ± 14% numa-meminfo.node1.Inactive(anon)
229302 ± 15% -28.5% 163948 ± 17% numa-meminfo.node2.AnonPages
231172 ± 16% -28.5% 165270 ± 17% numa-meminfo.node2.Inactive
231172 ± 16% -28.5% 165270 ± 17% numa-meminfo.node2.Inactive(anon)
49578 ± 12% +22.1% 60515 ± 14% numa-vmstat.node1.nr_anon_pages
50070 ± 12% +21.2% 60697 ± 14% numa-vmstat.node1.nr_inactive_anon
50071 ± 12% +21.2% 60697 ± 14% numa-vmstat.node1.nr_zone_inactive_anon
57327 ± 15% -28.4% 41064 ± 17% numa-vmstat.node2.nr_anon_pages
57794 ± 16% -28.4% 41393 ± 17% numa-vmstat.node2.nr_inactive_anon
57794 ± 16% -28.4% 41393 ± 17% numa-vmstat.node2.nr_zone_inactive_anon
0.01 ± 4% +7.7% 0.02 perf-stat.i.MPKI
8.662e+10 -5.4% 8.197e+10 perf-stat.i.branch-instructions
3.336e+08 -4.5% 3.187e+08 perf-stat.i.branch-misses
15.22 ± 2% +1.1 16.33 ± 2% perf-stat.i.cache-miss-rate%
1193768 ± 4% +8.9% 1300334 ± 2% perf-stat.i.cache-misses
0.99 +5.9% 1.05 perf-stat.i.cpi
1.439e+11 -5.4% 1.362e+11 perf-stat.i.dTLB-loads
0.00 +0.0 0.00 perf-stat.i.dTLB-store-miss-rate%
255388 -1.9% 250535 perf-stat.i.dTLB-store-misses
1.079e+11 -5.4% 1.021e+11 perf-stat.i.dTLB-stores
5.753e+11 -5.4% 5.444e+11 perf-stat.i.instructions
1.01 -5.5% 0.96 perf-stat.i.ipc
1762 -5.4% 1667 perf-stat.i.metric.M/sec
233635 ± 3% +6.3% 248433 perf-stat.i.node-load-misses
106279 ± 3% +13.5% 120679 perf-stat.i.node-store-misses
0.01 ± 4% +7.1% 0.02 perf-stat.overall.MPKI
15.04 +1.0 16.08 ± 2% perf-stat.overall.cache-miss-rate%
0.99 +5.9% 1.05 perf-stat.overall.cpi
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
1.01 -5.5% 0.96 perf-stat.overall.ipc
8.633e+10 -5.4% 8.17e+10 perf-stat.ps.branch-instructions
3.325e+08 -4.5% 3.176e+08 perf-stat.ps.branch-misses
1.434e+11 -5.4% 1.357e+11 perf-stat.ps.dTLB-loads
254865 -1.9% 249975 perf-stat.ps.dTLB-store-misses
1.075e+11 -5.4% 1.017e+11 perf-stat.ps.dTLB-stores
5.734e+11 -5.4% 5.426e+11 perf-stat.ps.instructions
232956 ± 3% +6.3% 247689 perf-stat.ps.node-load-misses
105863 ± 3% +13.6% 120215 perf-stat.ps.node-store-misses
1.739e+14 -5.4% 1.645e+14 perf-stat.total.instructions
33.36 -2.0 31.32 perf-profile.calltrace.cycles-pp.__entry_text_start.syscall
1.70 -0.3 1.36 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
6.47 -0.3 6.14 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
2.22 -0.1 2.13 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
97.70 -0.1 97.62 perf-profile.calltrace.cycles-pp.syscall
0.92 -0.1 0.86 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall
3.48 +0.0 3.51 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
1.98 +0.0 2.02 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
8.68 +0.7 9.40 perf-profile.calltrace.cycles-pp.futex_hash.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
5.94 +1.2 7.16 perf-profile.calltrace.cycles-pp.get_futex_key.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
23.75 +1.5 25.23 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
51.01 +2.2 53.22 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
43.94 +2.6 46.55 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
32.18 +3.0 35.18 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
26.90 +3.4 30.27 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
21.50 -1.3 20.19 perf-profile.children.cycles-pp.__entry_text_start
12.92 -0.8 12.09 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
7.38 -0.5 6.89 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.90 -0.4 1.53 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
2.40 -0.1 2.30 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.19 ± 7% +0.0 0.23 ± 4% perf-profile.children.cycles-pp.perf_prepare_sample
0.22 ± 6% +0.0 0.26 ± 3% perf-profile.children.cycles-pp.perf_tp_event
0.22 ± 6% +0.0 0.26 ± 3% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.01 ±223% +0.0 0.06 ± 9% perf-profile.children.cycles-pp.account_user_time
0.01 ±223% +0.1 0.06 ± 14% perf-profile.children.cycles-pp.account_system_index_time
0.36 ± 4% +0.1 0.41 ± 5% perf-profile.children.cycles-pp.scheduler_tick
0.31 ± 5% +0.1 0.37 ± 4% perf-profile.children.cycles-pp.task_tick_fair
0.24 ± 9% +0.1 0.30 ± 5% perf-profile.children.cycles-pp.update_curr
0.01 ±223% +0.1 0.08 ± 12% perf-profile.children.cycles-pp.__perf_event_header__init_id
0.01 ±223% +0.1 0.08 ± 12% perf-profile.children.cycles-pp.__task_pid_nr_ns
0.47 ± 7% +0.1 0.57 ± 5% perf-profile.children.cycles-pp.update_process_times
0.77 ± 4% +0.1 0.87 ± 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.71 ± 4% +0.1 0.82 ± 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.48 ± 8% +0.1 0.59 ± 5% perf-profile.children.cycles-pp.tick_sched_handle
0.67 ± 4% +0.1 0.78 ± 4% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.66 ± 4% +0.1 0.78 ± 4% perf-profile.children.cycles-pp.hrtimer_interrupt
0.54 ± 7% +0.1 0.65 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.50 ± 8% +0.1 0.61 ± 5% perf-profile.children.cycles-pp.tick_sched_timer
8.78 +0.8 9.55 perf-profile.children.cycles-pp.futex_hash
6.02 +1.3 7.37 perf-profile.children.cycles-pp.get_futex_key
24.17 +1.9 26.11 perf-profile.children.cycles-pp.futex_wake
51.45 +2.2 53.67 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
44.71 +2.6 47.28 perf-profile.children.cycles-pp.do_syscall_64
32.55 +3.1 35.65 perf-profile.children.cycles-pp.__x64_sys_futex
27.26 +3.2 30.44 perf-profile.children.cycles-pp.do_futex
12.56 -0.8 11.75 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
16.26 -0.7 15.60 perf-profile.self.cycles-pp.syscall
9.62 -0.6 9.02 perf-profile.self.cycles-pp.__entry_text_start
6.82 -0.4 6.46 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.59 -0.2 1.37 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
2.21 -0.2 1.98 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
2.88 -0.1 2.75 perf-profile.self.cycles-pp.do_syscall_64
5.51 -0.1 5.40 perf-profile.self.cycles-pp.syscall_return_via_sysret
2.38 -0.1 2.27 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
3.41 +0.0 3.43 perf-profile.self.cycles-pp.exit_to_user_mode_prepare
0.32 ± 2% +0.0 0.34 ± 2% perf-profile.self.cycles-pp.syscall@plt
0.01 ±223% +0.0 0.06 ± 9% perf-profile.self.cycles-pp.account_user_time
0.01 ±223% +0.1 0.06 ± 14% perf-profile.self.cycles-pp.account_system_index_time
0.01 ±223% +0.1 0.07 ± 12% perf-profile.self.cycles-pp.__task_pid_nr_ns
8.59 +0.4 9.01 perf-profile.self.cycles-pp.futex_hash
9.35 +0.5 9.83 perf-profile.self.cycles-pp.futex_wake
3.22 +1.0 4.17 perf-profile.self.cycles-pp.do_futex
5.71 +1.2 6.92 perf-profile.self.cycles-pp.get_futex_key
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
View attachment "config-6.1.0-rc1-00040-g80e4c1cd42ff" of type "text/plain" (166272 bytes)
View attachment "job-script" of type "text/plain" (7825 bytes)
View attachment "job.yaml" of type "text/plain" (5344 bytes)
View attachment "reproduce" of type "text/plain" (346 bytes)
Powered by blists - more mailing lists