[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20180930065120.GM15893@shao2-debian>
Date: Sun, 30 Sep 2018 14:51:20 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Andy Lutomirski <luto@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...e.de>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Joerg Roedel <joro@...tes.org>, Jiri Olsa <jolsa@...hat.com>,
Andi Kleen <ak@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
Stephen Rothwell <sfr@...b.auug.org.au>, lkp@...org
Subject: [LKP] [x86/pti/64] bf904d2762: will-it-scale.per_thread_ops 1.7%
improvement
Greeting,
FYI, we noticed a 1.7% improvement of will-it-scale.per_thread_ops due to commit:
commit: bf904d2762ee6fc1e4acfcb0772bbfb4a27ad8a6 ("x86/pti/64: Remove the SYSCALL64 entry trampoline")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
with following parameters:
nr_task: 16
mode: thread
test: pwrite1
cpufreq_governor: performance
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.2/thread/16/debian-x86_64-2018-04-03.cgz/lkp-bdw-ep3d/pwrite1/will-it-scale
commit:
98f05b5138 ("x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space")
bf904d2762 ("x86/pti/64: Remove the SYSCALL64 entry trampoline")
98f05b5138f0a9b5 bf904d2762ee6fc1e4acfcb077
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
1:4 -25% :4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
2:4 -50% :4 dmesg.WARNING:at_ip_fsnotify/0x
%stddev %change %stddev
\ | \
1221307 +1.7% 1242132 will-it-scale.per_thread_ops
7349 ± 3% +3.6% 7616 will-it-scale.time.minor_page_faults
675.23 +1.8% 687.28 will-it-scale.time.user_time
19540927 +1.7% 19874128 will-it-scale.workload
4323 ± 16% -54.7% 1958 ±103% numa-numastat.node0.other_node
98872 ± 24% +33.4% 131877 ± 10% numa-meminfo.node0.AnonPages
2292 ± 8% -10.5% 2050 ± 7% numa-meminfo.node1.PageTables
24718 ± 24% +33.4% 32969 ± 10% numa-vmstat.node0.nr_anon_pages
7864 ± 12% +21.7% 9568 ± 15% numa-vmstat.node1
573.00 ± 8% -10.6% 512.50 ± 7% numa-vmstat.node1.nr_page_table_pages
2.25 ± 15% -50.0% 1.12 ± 60% sched_debug.cfs_rq:/.load_avg.min
418.57 ± 87% -81.2% 78.54 ±173% sched_debug.cfs_rq:/.removed.runnable_sum.avg
7842 ± 70% -76.0% 1885 ±173% sched_debug.cfs_rq:/.removed.runnable_sum.max
1734 ± 77% -78.3% 376.68 ±173% sched_debug.cfs_rq:/.removed.runnable_sum.stddev
-2477409 -0.1% -2474518 sched_debug.cfs_rq:/.spread0.min
209211 ± 19% -30.2% 146101 ± 30% sched_debug.cpu.avg_idle.min
70.04 ± 7% -18.4% 57.17 ± 7% sched_debug.cpu.cpu_load[2].max
66.92 ± 5% -11.1% 59.46 ± 6% sched_debug.cpu.cpu_load[3].max
6736 ± 23% +37.8% 9285 ± 9% sched_debug.cpu.ttwu_local.max
1672 ± 12% +32.2% 2210 ± 11% sched_debug.cpu.ttwu_local.stddev
1.81 -0.3 1.56 perf-stat.branch-miss-rate%
4.262e+10 -13.3% 3.696e+10 perf-stat.branch-misses
1.27 -1.3% 1.25 perf-stat.cpi
0.01 ± 7% -0.0 0.00 ± 2% perf-stat.dTLB-load-miss-rate%
5.163e+08 ± 7% -62.1% 1.958e+08 ± 2% perf-stat.dTLB-load-misses
4.318e+12 +1.4% 4.38e+12 perf-stat.dTLB-loads
0.01 ± 6% -0.0 0.00 ± 4% perf-stat.dTLB-store-miss-rate%
4.264e+08 ± 6% -69.6% 1.294e+08 ± 4% perf-stat.dTLB-store-misses
2.915e+12 +1.1% 2.947e+12 perf-stat.dTLB-stores
2.21 ± 3% +95.5 97.67 perf-stat.iTLB-load-miss-rate%
2.564e+08 ± 3% +2372.0% 6.338e+09 perf-stat.iTLB-load-misses
1.136e+10 -98.7% 1.509e+08 perf-stat.iTLB-loads
1.18e+13 +1.4% 1.196e+13 perf-stat.instructions
46053 ± 3% -95.9% 1887 ± 2% perf-stat.instructions-per-iTLB-miss
0.79 +1.4% 0.80 perf-stat.ipc
8.65 ± 4% -8.7 0.00 perf-profile.calltrace.cycles-pp.__entry_SYSCALL_64_trampoline
0.57 ± 4% -0.2 0.39 ± 57% perf-profile.calltrace.cycles-pp.___might_sleep.down_write.generic_file_write_iter.__vfs_write.vfs_write
0.00 +8.4 8.41 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
9.48 ± 5% -9.5 0.00 perf-profile.children.cycles-pp.__entry_SYSCALL_64_trampoline
0.03 ±100% +0.0 0.07 ± 17% perf-profile.children.cycles-pp.clockevents_program_event
0.01 ±173% +0.1 0.07 ± 23% perf-profile.children.cycles-pp.ktime_get
0.31 ± 6% +0.1 0.37 ± 6% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
0.35 ± 8% +0.1 0.42 ± 6% perf-profile.children.cycles-pp.apic_timer_interrupt
0.00 +0.2 0.17 ± 4% perf-profile.children.cycles-pp.__x86_indirect_thunk_r10
0.00 +1.0 0.96 perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
0.00 +8.4 8.42 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
9.31 ± 4% -9.3 0.00 perf-profile.self.cycles-pp.__entry_SYSCALL_64_trampoline
1.55 ± 6% -0.6 0.97 perf-profile.self.cycles-pp.do_syscall_64
1.03 ± 5% -0.2 0.81 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.00 +0.1 0.12 ± 3% perf-profile.self.cycles-pp.__x86_indirect_thunk_r10
0.00 +0.8 0.81 ± 2% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
0.00 +8.4 8.42 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
will-it-scale.per_thread_ops
1.255e+06 +-+-------------------------------------------------------------+
1.25e+06 +-+ O |
| O O O O |
1.245e+06 O-+ O O O O O O O O OO O O O |
1.24e+06 +-O O O OO O O OO O OO |
| OO O O |
1.235e+06 +-+ |
1.23e+06 +-+ |
1.225e+06 +-+.+ .+.+ |
| +.+. +. .+.+.++ + .+ .|
1.22e+06 +-+ +.+ +. .+. + +. .+ .+.+.+.++.+.+ +.+ |
1.215e+06 +-+ + + + .+ + |
| :.+ |
1.21e+06 +-+ + |
1.205e+06 +-+-------------------------------------------------------------+
will-it-scale.workload
2.01e+07 +-+--------------------------------------------------------------+
| |
2e+07 +-+ OO O O |
O O O O O O O |
1.99e+07 +-O O O O O O O O O O O O OO O O O OO |
1.98e+07 +-+ O O O O O |
| |
1.97e+07 +-+ |
| +.+ |
1.96e+07 +-+.+ .+. .+ +. .+. + : .|
1.95e+07 +-+ + +.+ +. .+ : + + : .+.++.+. .+.++.+ |
| + + .+ +.+ .++.+.+ + |
1.94e+07 +-+ + : + |
| : + |
1.93e+07 +-+--------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-4.19.0-rc2-00179-gbf904d2" of type "text/plain" (167672 bytes)
View attachment "job-script" of type "text/plain" (6938 bytes)
View attachment "job.yaml" of type "text/plain" (4592 bytes)
View attachment "reproduce" of type "text/plain" (310 bytes)
Powered by blists - more mailing lists