[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20181015031457.GC28215@shao2-debian>
Date: Mon, 15 Oct 2018 11:14:57 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Rik van Riel <riel@...riel.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Song Liu <songliubraving@...com>,
Dave Hansen <dave.hansen@...el.com>,
Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
"H. Peter Anvin" <hpa@...or.com>, tipbuild@...or.com, lkp@...org
Subject: [LKP] [x86/mm/tlb] 5462bc3a9a: unixbench.score 7.0% improvement
Greeting,
FYI, we noticed a 7.0% improvement of unixbench.score due to commit:
commit: 5462bc3a9a3c38328bbbd276d51164c7cf21d6a8 ("x86/mm/tlb: Always use lazy TLB mode")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm
in testcase: unixbench
on test machine: 8 threads Ivy Bridge with 16G memory
with following parameters:
runtime: 300s
nr_task: 1
test: context1
ucode: 0x20
cpufreq_governor: performance
test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.2/1/debian-x86_64-2018-04-03.cgz/300s/lkp-ivb-d01/context1/unixbench/0x20
commit:
a31acd3ee8 ("x86/mm: Page size aware flush_tlb_mm_range()")
5462bc3a9a ("x86/mm/tlb: Always use lazy TLB mode")
a31acd3ee8f7dbc0 5462bc3a9a3c38328bbbd276d5
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
1:4 -25% :4 dmesg.RIP:copy_page_to_iter
:4 100% 4:4 dmesg.RIP:cpuidle_enter_state
:4 25% 1:4 kmsg.ba52ac8>]usb_hcd_irq
1:4 -25% :4 kmsg.e4afb4>]usb_hcd_irq
:4 25% 1:4 kmsg.e5d84e9>]usb_hcd_irq
1:4 -25% :4 kmsg.eaf7194>]usb_hcd_irq
1:4 -25% :4 kmsg.f4ac>]usb_hcd_irq
:4 25% 1:4 kmsg.usb_hcd_irq
%stddev %change %stddev
\ | \
386.50 +7.0% 413.58 unixbench.score
410.13 -0.9% 406.25 unixbench.time.elapsed_time
410.13 -0.9% 406.25 unixbench.time.elapsed_time.max
55.00 +3.2% 56.75 unixbench.time.percent_of_cpu_this_job_got
207.51 +2.2% 212.17 unixbench.time.system_time
46114091 +7.0% 49358271 unixbench.time.voluntary_context_switches
62958045 +6.2% 66876363 unixbench.workload
22199 +1.9% 22621 interrupts.CAL:Function_call_interrupts
0.22 ± 43% -40.2% 0.13 ± 12% turbostat.CPU%c6
451150 +7.5% 484981 vmstat.system.cs
3399624 ± 12% -17.0% 2823082 ± 5% cpuidle.POLL.time
1497754 ± 2% +18.2% 1770379 cpuidle.POLL.usage
3826 ± 7% +18.1% 4518 ± 7% slabinfo.anon_vma.active_objs
3875 ± 6% +16.6% 4520 ± 7% slabinfo.anon_vma.num_objs
1280 ± 11% -14.6% 1093 ± 9% slabinfo.skbuff_head_cache.active_objs
147541 ± 7% +17.0% 172576 ± 9% sched_debug.cfs_rq:/.load.avg
89.04 ± 11% +24.6% 110.95 ± 7% sched_debug.cfs_rq:/.runnable_load_avg.avg
155.33 ± 15% +20.5% 187.16 ± 12% sched_debug.cfs_rq:/.runnable_load_avg.stddev
136488 ± 10% +19.9% 163599 ± 7% sched_debug.cfs_rq:/.runnable_weight.avg
139887 ± 7% +18.0% 165093 ± 3% sched_debug.cpu.load.avg
3820087 ± 9% +20.8% 4613034 ± 8% sched_debug.cpu.nr_switches.stddev
3818882 ± 9% +20.8% 4611438 ± 8% sched_debug.cpu.sched_count.stddev
1909422 ± 9% +20.8% 2305711 ± 8% sched_debug.cpu.sched_goidle.stddev
1909878 ± 9% +20.8% 2306344 ± 8% sched_debug.cpu.ttwu_count.stddev
2.633e+11 ± 25% +29.7% 3.415e+11 ± 12% perf-stat.branch-instructions
1.865e+08 +7.0% 1.995e+08 perf-stat.context-switches
1.41 -5.9% 1.32 perf-stat.cpi
1.46 ± 3% -0.5 1.00 ± 3% perf-stat.dTLB-load-miss-rate%
3.225e+11 ± 25% +30.0% 4.192e+11 ± 12% perf-stat.dTLB-loads
0.15 ± 7% -0.1 0.08 ± 3% perf-stat.dTLB-store-miss-rate%
2.001e+11 ± 25% +30.1% 2.604e+11 ± 12% perf-stat.dTLB-stores
77.66 -15.9 61.81 perf-stat.iTLB-load-miss-rate%
2.038e+09 ± 25% -45.2% 1.118e+09 ± 14% perf-stat.iTLB-load-misses
1.213e+12 ± 25% +29.4% 1.569e+12 ± 12% perf-stat.instructions
595.17 +136.3% 1406 perf-stat.instructions-per-iTLB-miss
0.71 +6.3% 0.76 perf-stat.ipc
10.90 ± 10% -4.5 6.41 ± 45% perf-profile.calltrace.cycles-pp.pipe_read.__vfs_read.vfs_read.ksys_read.do_syscall_64
11.03 ± 10% -4.4 6.60 ± 45% perf-profile.calltrace.cycles-pp.__vfs_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.76 ± 9% -3.2 4.54 ± 43% perf-profile.calltrace.cycles-pp.pipe_wait.pipe_read.__vfs_read.vfs_read.ksys_read
6.68 ± 9% -3.1 3.60 ± 44% perf-profile.calltrace.cycles-pp.schedule.pipe_wait.pipe_read.__vfs_read.vfs_read
6.49 ± 9% -3.0 3.48 ± 44% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.pipe_wait.pipe_read.__vfs_read
4.80 ± 5% -2.5 2.29 ± 48% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
4.68 ± 4% -2.5 2.20 ± 49% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule_idle.do_idle.cpu_startup_entry.start_secondary
1.01 ± 22% -0.6 0.36 ±102% perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.__vfs_read.vfs_read.ksys_read
11.81 ± 11% -5.5 6.27 ± 45% perf-profile.children.cycles-pp.__sched_text_start
12.77 ± 10% -4.6 8.13 ± 44% perf-profile.children.cycles-pp.ksys_read
12.33 ± 10% -4.6 7.74 ± 44% perf-profile.children.cycles-pp.vfs_read
10.92 ± 10% -4.5 6.45 ± 44% perf-profile.children.cycles-pp.pipe_read
11.05 ± 10% -4.4 6.66 ± 44% perf-profile.children.cycles-pp.__vfs_read
7.79 ± 9% -3.1 4.68 ± 44% perf-profile.children.cycles-pp.pipe_wait
6.69 ± 9% -3.0 3.69 ± 44% perf-profile.children.cycles-pp.schedule
2.68 ± 6% -2.6 0.08 ± 66% perf-profile.children.cycles-pp.switch_mm_irqs_off
5.12 ± 14% -2.5 2.64 ± 46% perf-profile.children.cycles-pp.schedule_idle
2.25 ± 8% -0.6 1.60 ± 34% perf-profile.children.cycles-pp.tick_nohz_next_event
1.08 ± 10% -0.6 0.51 ± 49% perf-profile.children.cycles-pp.copy_page_to_iter
0.48 ± 32% -0.3 0.18 ± 71% perf-profile.children.cycles-pp.touch_atime
1.07 ± 7% -0.3 0.79 ± 35% perf-profile.children.cycles-pp.__next_timer_interrupt
0.42 ± 28% -0.2 0.25 ± 32% perf-profile.children.cycles-pp.___perf_sw_event
0.68 ± 7% -0.2 0.50 ± 33% perf-profile.children.cycles-pp.find_next_bit
0.28 ± 17% -0.2 0.12 ± 57% perf-profile.children.cycles-pp.account_entity_enqueue
0.49 ± 9% -0.1 0.35 ± 32% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.32 ± 12% -0.1 0.19 ± 39% perf-profile.children.cycles-pp.__update_load_avg_se
0.20 ± 19% -0.1 0.06 ± 70% perf-profile.children.cycles-pp.pm_qos_request
0.18 ± 20% -0.1 0.05 ±124% perf-profile.children.cycles-pp.anon_pipe_buf_release
0.21 ± 19% -0.1 0.11 ± 70% perf-profile.children.cycles-pp.rcu_needs_cpu
0.14 ± 20% -0.1 0.04 ±104% perf-profile.children.cycles-pp.tick_check_broadcast_expired
0.18 ± 57% -0.1 0.10 ± 17% perf-profile.children.cycles-pp.clockevents_program_event
0.12 ± 39% -0.1 0.03 ±102% perf-profile.children.cycles-pp.irq_work_needs_cpu
0.15 ± 18% -0.1 0.09 ± 40% perf-profile.children.cycles-pp.put_prev_entity
0.08 ± 40% -0.1 0.03 ±100% perf-profile.children.cycles-pp.run_timer_softirq
1.28 ± 5% -1.2 0.07 ± 62% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.53 ± 17% -0.4 0.17 ± 39% perf-profile.self.cycles-pp.copy_page_to_iter
0.24 ± 42% -0.2 0.04 ±101% perf-profile.self.cycles-pp.atime_needs_update
0.47 ± 8% -0.1 0.34 ± 32% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.32 ± 13% -0.1 0.18 ± 38% perf-profile.self.cycles-pp.__update_load_avg_se
0.18 ± 20% -0.1 0.05 ±124% perf-profile.self.cycles-pp.anon_pipe_buf_release
0.18 ± 22% -0.1 0.06 ± 70% perf-profile.self.cycles-pp.pm_qos_request
0.24 ± 12% -0.1 0.12 ± 78% perf-profile.self.cycles-pp.__calc_delta
0.20 ± 17% -0.1 0.11 ± 68% perf-profile.self.cycles-pp.rcu_needs_cpu
0.14 ± 19% -0.1 0.04 ±104% perf-profile.self.cycles-pp.tick_check_broadcast_expired
0.11 ± 41% -0.1 0.03 ±100% perf-profile.self.cycles-pp.irq_work_needs_cpu
0.09 ± 20% -0.1 0.04 ±103% perf-profile.self.cycles-pp.current_time
unixbench.time.voluntary_context_switches
6e+07 +-+-----------------------------------------------------------------+
| |
5e+07 O-OO O O OO O OO O OO O O OO O OO O OO O O O O O O OO O O |
|.++.+.+ +.+.++.+.++.+.+.++.+.++.+.++.+.+.+O.+.O +.+ +.O.O+.+.++.|
| : : : : : : |
4e+07 +-+ : : : : : : |
| : : : : : : |
3e+07 +-+ : : : : : : |
| : : : : : : |
2e+07 +-+ :: :: :: |
| :: :: :: |
| :: :: :: |
1e+07 +-+ : : : |
| : : : |
0 +-+-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-4.19.0-rc5-00036-g5462bc3" of type "text/plain" (167748 bytes)
View attachment "job-script" of type "text/plain" (6920 bytes)
View attachment "job.yaml" of type "text/plain" (4540 bytes)
View attachment "reproduce" of type "text/plain" (293 bytes)
Powered by blists - more mailing lists