[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20181102013306.GH24195@shao2-debian>
Date: Fri, 2 Nov 2018 09:33:07 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Andy Lutomirski <luto@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...e.de>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Joerg Roedel <joro@...tes.org>, Jiri Olsa <jolsa@...hat.com>,
Andi Kleen <ak@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>, tipbuild@...or.com, lkp@...org
Subject: [LKP] [x86/pti/64] 86635715ee: will-it-scale.per_thread_ops 4.1%
improvement
Greeting,
FYI, we noticed a 4.1% improvement of will-it-scale.per_thread_ops due to commit:
commit: 86635715ee4228ded59f662dab36e9732b9c978f ("x86/pti/64: Remove the SYSCALL64 entry trampoline")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/pti
in testcase: will-it-scale
on test machine: 80 threads Skylake with 64G memory
with following parameters:
nr_task: 100%
mode: thread
test: sched_yield
cpufreq_governor: performance
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.2/thread/100%/debian-x86_64-2018-04-03.cgz/lkp-skl-2sp2/sched_yield/will-it-scale
commit:
98f05b5138 ("x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space")
86635715ee ("x86/pti/64: Remove the SYSCALL64 entry trampoline")
98f05b5138f0a9b5 86635715ee4228ded59f662dab
---------------- --------------------------
%stddev %change %stddev
\ | \
1946690 +4.1% 2027191 will-it-scale.per_thread_ops
17228 -1.5% 16976 will-it-scale.time.system_time
6495 +4.2% 6768 will-it-scale.time.user_time
1.557e+08 +4.1% 1.622e+08 will-it-scale.workload
93021 ±153% -92.9% 6582 ± 41% turbostat.C1
99833 ±144% -88.9% 11048 ± 25% cpuidle.C1.usage
2527 ±150% -94.0% 152.00 ± 44% cpuidle.POLL.usage
14344 ± 16% -90.6% 1351 ±173% numa-numastat.node0.other_node
1443 ±161% +897.9% 14402 ± 16% numa-numastat.node1.other_node
5458 ± 73% -59.7% 2199 ±160% proc-vmstat.numa_pages_migrated
5458 ± 73% -59.7% 2199 ±160% proc-vmstat.pgmigrate_success
99.51 ± 6% +17.9% 117.29 ± 6% sched_debug.cfs_rq:/.util_avg.stddev
397470 ± 21% +46.3% 581313 ± 12% sched_debug.cpu.avg_idle.min
53.33 ± 6% -10.9% 47.54 ± 6% sched_debug.cpu.ttwu_local.min
1354 ± 11% +23.6% 1674 ± 6% slabinfo.UNIX.active_objs
1354 ± 11% +23.6% 1674 ± 6% slabinfo.UNIX.num_objs
2423 ± 9% +22.0% 2958 ± 6% slabinfo.sock_inode_cache.active_objs
2423 ± 9% +22.0% 2958 ± 6% slabinfo.sock_inode_cache.num_objs
122834 ± 14% +35.5% 166488 ± 3% numa-meminfo.node0.Active
121575 ± 15% +36.7% 166251 ± 3% numa-meminfo.node0.Active(anon)
1258 ± 42% -81.1% 237.75 ±173% numa-meminfo.node0.Active(file)
1082 ± 10% -73.1% 291.00 ±145% numa-meminfo.node0.Inactive(file)
2971 ± 9% -28.9% 2113 ± 25% numa-meminfo.node0.PageTables
18685 +182.7% 52829 ± 13% numa-meminfo.node0.Shmem
185939 ± 9% -24.4% 140633 ± 4% numa-meminfo.node1.Active
185939 ± 9% -24.7% 139946 ± 4% numa-meminfo.node1.Active(anon)
6249 ± 2% +10.5% 6904 ± 2% numa-meminfo.node1.KernelStack
1571 ± 16% +55.3% 2439 ± 21% numa-meminfo.node1.PageTables
55233 ± 3% +10.6% 61093 ± 3% numa-meminfo.node1.SUnreclaim
41375 ± 5% -84.4% 6447 ±108% numa-meminfo.node1.Shmem
8.81e+12 +2.4% 9.019e+12 perf-stat.branch-instructions
2.16 -0.5 1.66 perf-stat.branch-miss-rate%
1.9e+11 -21.4% 1.494e+11 perf-stat.branch-misses
1.55 -2.9% 1.51 perf-stat.cpi
1.282e+13 +3.3% 1.324e+13 perf-stat.dTLB-loads
8.156e+12 +2.3% 8.34e+12 perf-stat.dTLB-stores
1.82 ± 6% +98.1 99.87 perf-stat.iTLB-load-miss-rate%
4.11e+08 ± 6% +12153.3% 5.036e+10 perf-stat.iTLB-load-misses
2.222e+10 ± 2% -99.7% 66461819 ± 8% perf-stat.iTLB-loads
4.272e+13 +3.3% 4.412e+13 perf-stat.instructions
104366 ± 6% -99.2% 876.17 perf-stat.instructions-per-iTLB-miss
0.64 +3.0% 0.66 perf-stat.ipc
30383 ± 15% +36.9% 41582 ± 3% numa-vmstat.node0.nr_active_anon
314.75 ± 42% -81.2% 59.25 ±173% numa-vmstat.node0.nr_active_file
742.25 ± 9% -28.9% 527.50 ± 25% numa-vmstat.node0.nr_page_table_pages
4671 +183.0% 13221 ± 13% numa-vmstat.node0.nr_shmem
30383 ± 15% +36.9% 41582 ± 3% numa-vmstat.node0.nr_zone_active_anon
314.75 ± 42% -81.2% 59.25 ±173% numa-vmstat.node0.nr_zone_active_file
14332 ± 16% -89.5% 1504 ±154% numa-vmstat.node0.numa_other
46526 ± 9% -24.8% 34987 ± 4% numa-vmstat.node1.nr_active_anon
6249 ± 2% +10.5% 6902 ± 3% numa-vmstat.node1.nr_kernel_stack
392.25 ± 16% +55.3% 609.25 ± 21% numa-vmstat.node1.nr_page_table_pages
10364 ± 5% -84.4% 1612 ±108% numa-vmstat.node1.nr_shmem
13808 ± 3% +10.6% 15273 ± 3% numa-vmstat.node1.nr_slab_unreclaimable
46526 ± 9% -24.8% 34987 ± 4% numa-vmstat.node1.nr_zone_active_anon
136694 +9.4% 149504 numa-vmstat.node1.numa_other
20.17 -20.2 0.00 perf-profile.calltrace.cycles-pp.__entry_SYSCALL_64_trampoline
54.02 -1.4 52.62 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
51.83 -0.9 50.89 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.66 -0.2 2.41 perf-profile.calltrace.cycles-pp.pick_next_entity.pick_next_task_fair.__sched_text_start.schedule.__x64_sys_sched_yield
0.80 ± 2% -0.1 0.73 ± 3% perf-profile.calltrace.cycles-pp.__list_del_entry_valid.pick_next_task_fair.__sched_text_start.schedule.__x64_sys_sched_yield
0.73 +0.0 0.75 perf-profile.calltrace.cycles-pp.__list_add_valid.pick_next_task_fair.__sched_text_start.schedule.__x64_sys_sched_yield
1.93 +0.2 2.08 perf-profile.calltrace.cycles-pp._raw_spin_lock.__sched_text_start.schedule.__x64_sys_sched_yield.do_syscall_64
2.48 +0.2 2.66 perf-profile.calltrace.cycles-pp.yield_task_fair.do_sched_yield.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.53 ± 3% +0.3 0.79 ± 3% perf-profile.calltrace.cycles-pp.clear_buddies.pick_next_entity.pick_next_task_fair.__sched_text_start.schedule
7.10 +0.4 7.55 perf-profile.calltrace.cycles-pp.do_sched_yield.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
37.40 +0.5 37.87 perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.5 0.53 ± 2% perf-profile.calltrace.cycles-pp.__x86_indirect_thunk_rax
38.68 +0.5 39.22 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.04 +0.8 4.84 perf-profile.calltrace.cycles-pp.__calc_delta.update_curr.pick_next_task_fair.__sched_text_start.schedule
19.43 +0.9 20.35 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
47.51 +1.1 48.59 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +21.5 21.51 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
20.91 -20.9 0.00 perf-profile.children.cycles-pp.__entry_SYSCALL_64_trampoline
54.20 -1.5 52.74 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
51.99 -0.9 51.08 perf-profile.children.cycles-pp.do_syscall_64
2.92 -0.1 2.81 ± 2% perf-profile.children.cycles-pp.pick_next_entity
0.85 ± 2% -0.1 0.77 ± 3% perf-profile.children.cycles-pp.__list_del_entry_valid
0.75 +0.0 0.77 perf-profile.children.cycles-pp.__list_add_valid
0.45 +0.0 0.48 perf-profile.children.cycles-pp.rcu_note_context_switch
4.52 +0.1 4.59 perf-profile.children.cycles-pp.update_rq_clock
0.40 +0.1 0.47 ± 2% perf-profile.children.cycles-pp.check_cfs_rq_runtime
2.51 +0.2 2.69 perf-profile.children.cycles-pp.yield_task_fair
4.04 +0.2 4.25 perf-profile.children.cycles-pp._raw_spin_lock
0.58 ± 4% +0.3 0.86 ± 2% perf-profile.children.cycles-pp.clear_buddies
7.11 +0.5 7.57 perf-profile.children.cycles-pp.do_sched_yield
37.75 +0.5 38.28 perf-profile.children.cycles-pp.__sched_text_start
38.70 +0.5 39.24 perf-profile.children.cycles-pp.schedule
4.44 +0.8 5.26 perf-profile.children.cycles-pp.__calc_delta
22.68 +1.1 23.77 perf-profile.children.cycles-pp.syscall_return_via_sysret
47.77 +1.1 48.87 perf-profile.children.cycles-pp.__x64_sys_sched_yield
0.00 +1.3 1.26 perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
0.00 +21.5 21.54 perf-profile.children.cycles-pp.entry_SYSCALL_64
20.83 -20.8 0.00 perf-profile.self.cycles-pp.__entry_SYSCALL_64_trampoline
3.96 -2.0 2.01 perf-profile.self.cycles-pp.do_syscall_64
7.41 -0.7 6.66 perf-profile.self.cycles-pp.update_curr
2.29 -0.6 1.65 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
2.22 -0.4 1.83 ± 2% perf-profile.self.cycles-pp.pick_next_entity
0.29 ± 2% -0.0 0.26 ± 2% perf-profile.self.cycles-pp.task_of
0.72 +0.0 0.75 perf-profile.self.cycles-pp.__list_add_valid
0.44 +0.0 0.47 perf-profile.self.cycles-pp.rcu_note_context_switch
0.23 +0.1 0.29 perf-profile.self.cycles-pp.check_cfs_rq_runtime
2.06 +0.1 2.19 perf-profile.self.cycles-pp.__x64_sys_sched_yield
2.27 +0.2 2.44 perf-profile.self.cycles-pp.yield_task_fair
3.94 +0.2 4.12 perf-profile.self.cycles-pp._raw_spin_lock
0.39 ± 3% +0.2 0.58 ± 2% perf-profile.self.cycles-pp.clear_buddies
2.32 +0.2 2.54 perf-profile.self.cycles-pp.do_sched_yield
7.06 +0.2 7.31 perf-profile.self.cycles-pp.pick_next_task_fair
5.34 +0.2 5.59 perf-profile.self.cycles-pp.__sched_text_start
4.35 +0.8 5.17 perf-profile.self.cycles-pp.__calc_delta
22.66 +1.1 23.75 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.00 +1.1 1.14 perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
0.00 +21.5 21.54 perf-profile.self.cycles-pp.entry_SYSCALL_64
will-it-scale.per_thread_ops
2.1e+06 +-+--------------------------------------------------------------+
2.08e+06 +-+ O |
| O O O O O |
2.06e+06 O-+O O O O O O O |
2.04e+06 +-+ O O O O O O |
| O O O
2.02e+06 +-+ O O |
2e+06 +-+ |
1.98e+06 +-+ .+.. .+. .+ |
| +.+..+..+.+. +. +. : |
1.96e+06 +-+ + : +. .+.. |
1.94e+06 +-++.+ + : .. +. +..+ |
| + + : .+..+ |
1.92e+06 +-+ + .. + |
1.9e+06 +-+--------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-4.19.0-rc2-00179-g8663571" of type "text/plain" (167672 bytes)
View attachment "job.yaml" of type "text/plain" (4869 bytes)
View attachment "reproduce" of type "text/plain" (314 bytes)
Powered by blists - more mailing lists