[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200228074455.GN6548@shao2-debian>
Date: Fri, 28 Feb 2020 15:44:55 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Uladzislau Rezki <urezki@...il.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>,
LKML <linux-kernel@...r.kernel.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>, lkp@...ts.01.org
Subject: [rcu] 76550daa4d: will-it-scale.per_thread_ops -10.5% regression
Greeting,
FYI, we noticed a -10.5% regression of will-it-scale.per_thread_ops due to commit:
commit: 76550daa4d1edaa8251460bd1d4a11b5df23c1c0 ("rcu: Support kfree_bulk() interface in kfree_rcu()")
https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2020.02.13c
in testcase: will-it-scale
on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
with following parameters:
nr_task: 100%
mode: thread
test: open2
cpufreq_governor: performance
ucode: 0x11
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <rong.a.chen@...el.com>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/thread/100%/debian-x86_64-20191114.cgz/lkp-knm01/open2/will-it-scale/0x11
commit:
e8afd73e56 ("rcu: Don't flag non-starting GPs before GP kthread is running")
76550daa4d ("rcu: Support kfree_bulk() interface in kfree_rcu()")
e8afd73e56e02b4d 76550daa4d1edaa8251460bd1d4
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
2:4 -50% :4 dmesg.WARNING:at_ip__fsnotify_parent/0x
:4 25% 1:4 dmesg.WARNING:at_ip__slab_free/0x
%stddev %change %stddev
\ | \
1021 -10.5% 914.50 will-it-scale.per_thread_ops
807081 -33.7% 534989 ± 2% will-it-scale.time.involuntary_context_switches
489.19 ± 9% -40.3% 292.19 ± 3% will-it-scale.time.user_time
294257 -10.5% 263455 will-it-scale.workload
1.68 ± 2% +30.8% 2.20 turbostat.RAMWatt
6340 -28.8% 4512 ± 2% vmstat.system.cs
0.09 ± 12% -0.0 0.06 ± 3% mpstat.cpu.all.soft%
0.71 ± 7% -0.2 0.49 ± 3% mpstat.cpu.all.usr%
999.25 ± 6% -17.3% 826.50 ± 13% slabinfo.skbuff_fclone_cache.active_objs
999.25 ± 6% -17.3% 826.50 ± 13% slabinfo.skbuff_fclone_cache.num_objs
19880 ± 5% -9.1% 18079 softirqs.CPU100.RCU
21159 ± 2% -6.7% 19750 ± 2% softirqs.CPU13.RCU
122401 ± 10% +23.7% 151373 ± 8% softirqs.CPU144.TIMER
21180 -8.5% 19379 softirqs.CPU15.RCU
21456 ± 7% -11.7% 18945 softirqs.CPU24.RCU
124140 ± 7% +12.3% 139469 ± 7% softirqs.CPU250.TIMER
20964 ± 11% -17.5% 17289 softirqs.CPU261.RCU
20945 -8.8% 19102 softirqs.CPU30.RCU
21274 -9.8% 19194 softirqs.CPU31.RCU
21024 ± 2% -10.3% 18853 softirqs.CPU33.RCU
20571 -8.9% 18746 softirqs.CPU35.RCU
21289 ± 2% -8.0% 19588 softirqs.CPU5.RCU
1199 ± 3% +11.2% 1332 ± 2% sched_debug.cfs_rq:/.exec_clock.stddev
969782 ± 8% -26.1% 716472 ± 22% sched_debug.cfs_rq:/.load.max
226.86 ± 6% -70.5% 66.95 ± 18% sched_debug.cfs_rq:/.nr_spread_over.avg
321.05 ± 5% -47.1% 169.95 ± 6% sched_debug.cfs_rq:/.nr_spread_over.max
107.45 ± 6% -68.8% 33.55 ± 26% sched_debug.cfs_rq:/.nr_spread_over.min
39.24 ± 17% -61.2% 15.24 ± 9% sched_debug.cfs_rq:/.nr_spread_over.stddev
969780 ± 8% -26.1% 716278 ± 22% sched_debug.cfs_rq:/.runnable_weight.max
1145 ± 21% -25.2% 856.65 ± 2% sched_debug.cfs_rq:/.util_est_enqueued.max
169.65 ± 46% -80.8% 32.65 ± 79% sched_debug.cfs_rq:/.util_est_enqueued.min
871.94 ± 5% +23.4% 1075 ± 3% sched_debug.cpu.clock.stddev
871.94 ± 5% +23.4% 1075 ± 3% sched_debug.cpu.clock_task.stddev
5190 ± 11% +22.1% 6338 ± 6% sched_debug.cpu.curr->pid.max
271.88 ± 6% +30.5% 354.73 ± 9% sched_debug.cpu.curr->pid.stddev
0.00 ± 5% +23.8% 0.00 ± 2% sched_debug.cpu.next_balance.stddev
5624 -13.1% 4890 sched_debug.cpu.nr_switches.avg
2344 ± 3% -31.0% 1617 ± 3% sched_debug.cpu.nr_switches.min
2803 -25.7% 2082 ± 2% sched_debug.cpu.sched_count.avg
1906 -30.9% 1316 sched_debug.cpu.sched_count.min
1322 -27.8% 954.60 ± 2% sched_debug.cpu.ttwu_count.avg
919.25 -32.6% 619.90 sched_debug.cpu.ttwu_count.min
1255 -29.3% 888.47 ± 2% sched_debug.cpu.ttwu_local.avg
887.45 -32.9% 595.50 sched_debug.cpu.ttwu_local.min
8.476e+09 +1.3% 8.589e+09 perf-stat.i.branch-instructions
48929091 -3.8% 47071099 perf-stat.i.cache-misses
6396 -30.2% 4467 ± 2% perf-stat.i.context-switches
12.78 -1.4% 12.61 perf-stat.i.cpi
9065 +3.7% 9404 perf-stat.i.cycles-between-cache-misses
71922406 ± 2% -7.7% 66350699 ± 5% perf-stat.i.iTLB-load-misses
3.475e+10 +1.3% 3.519e+10 perf-stat.i.iTLB-loads
3.47e+10 +1.2% 3.512e+10 perf-stat.i.instructions
482.82 ± 2% +10.0% 531.13 ± 6% perf-stat.i.instructions-per-iTLB-miss
0.08 +1.4% 0.08 perf-stat.i.ipc
12.81 -1.3% 12.64 perf-stat.overall.cpi
9077 +3.8% 9423 perf-stat.overall.cycles-between-cache-misses
0.21 ± 2% -0.0 0.19 ± 5% perf-stat.overall.iTLB-load-miss-rate%
483.60 ± 2% +9.9% 531.72 ± 6% perf-stat.overall.instructions-per-iTLB-miss
0.08 +1.3% 0.08 perf-stat.overall.ipc
36101370 +13.0% 40812598 perf-stat.overall.path-length
8.46e+09 +1.2% 8.565e+09 perf-stat.ps.branch-instructions
48864485 -3.9% 46979669 perf-stat.ps.cache-misses
6244 -29.3% 4417 ± 2% perf-stat.ps.context-switches
71662456 ± 2% -7.8% 66078101 ± 5% perf-stat.ps.iTLB-load-misses
3.466e+10 +1.2% 3.507e+10 perf-stat.ps.iTLB-loads
3.463e+10 +1.1% 3.502e+10 perf-stat.ps.instructions
1.062e+13 +1.2% 1.075e+13 perf-stat.total.instructions
1.31 ± 2% -0.2 1.16 ± 4% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.26 ± 2% -0.1 1.13 ± 3% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.do_sys_open.do_syscall_64
48.05 +0.1 48.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__alloc_fd.do_sys_openat2.do_sys_open
48.09 +0.1 48.21 perf-profile.calltrace.cycles-pp._raw_spin_lock.__alloc_fd.do_sys_openat2.do_sys_open.do_syscall_64
48.16 +0.2 48.31 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__close_fd.__x64_sys_close.do_syscall_64
48.20 +0.2 48.37 perf-profile.calltrace.cycles-pp._raw_spin_lock.__close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
48.32 +0.2 48.49 perf-profile.calltrace.cycles-pp.__close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__GI___libc_close
48.24 +0.2 48.41 perf-profile.calltrace.cycles-pp.__alloc_fd.do_sys_openat2.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
48.46 +0.2 48.64 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__GI___libc_close
1.32 ± 2% -0.2 1.16 ± 3% perf-profile.children.cycles-pp.do_filp_open
0.35 ± 25% -0.1 0.21 ± 14% perf-profile.children.cycles-pp.update_curr
1.27 ± 2% -0.1 1.14 ± 4% perf-profile.children.cycles-pp.path_openat
0.44 ± 2% -0.1 0.37 ± 2% perf-profile.children.cycles-pp.alloc_empty_file
0.19 ± 5% -0.1 0.12 ± 4% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.33 ± 3% -0.1 0.27 ± 5% perf-profile.children.cycles-pp.__fput
0.50 ± 3% -0.1 0.44 ± 5% perf-profile.children.cycles-pp.exit_to_usermode_loop
0.15 ± 4% -0.1 0.09 ± 7% perf-profile.children.cycles-pp.security_file_alloc
0.45 ± 3% -0.1 0.40 ± 4% perf-profile.children.cycles-pp.task_work_run
0.39 ± 2% -0.0 0.35 ± 2% perf-profile.children.cycles-pp.__alloc_file
0.32 -0.0 0.29 ± 3% perf-profile.children.cycles-pp.link_path_walk
0.30 ± 5% -0.0 0.27 ± 4% perf-profile.children.cycles-pp.rcu_do_batch
0.30 ± 4% -0.0 0.28 ± 4% perf-profile.children.cycles-pp.rcu_core
0.22 -0.0 0.19 ± 2% perf-profile.children.cycles-pp.kmem_cache_free
0.12 ± 5% -0.0 0.10 ± 7% perf-profile.children.cycles-pp.inode_permission
0.06 ± 7% +0.0 0.07 ± 5% perf-profile.children.cycles-pp.___might_sleep
0.06 ± 11% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.run_timer_softirq
0.26 ± 2% +0.0 0.29 ± 3% perf-profile.children.cycles-pp.kmem_cache_alloc
0.10 ± 9% +0.0 0.14 ± 8% perf-profile.children.cycles-pp.enqueue_hrtimer
0.08 ± 10% +0.0 0.13 ± 6% perf-profile.children.cycles-pp.timerqueue_add
0.04 ± 57% +0.1 0.09 ± 11% perf-profile.children.cycles-pp.expand_files
0.05 ± 8% +0.1 0.11 ± 7% perf-profile.children.cycles-pp.perf_event_task_tick
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.__might_sleep
0.00 +0.1 0.06 ± 20% perf-profile.children.cycles-pp.locks_remove_posix
0.11 ± 7% +0.1 0.17 ± 17% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.08 ± 15% +0.1 0.15 ± 19% perf-profile.children.cycles-pp.rcu_irq_enter
99.41 +0.1 99.49 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
99.37 +0.1 99.46 perf-profile.children.cycles-pp.do_syscall_64
48.32 +0.2 48.49 perf-profile.children.cycles-pp.__close_fd
48.27 +0.2 48.44 perf-profile.children.cycles-pp.__alloc_fd
48.46 +0.2 48.64 perf-profile.children.cycles-pp.__x64_sys_close
96.38 +0.3 96.64 perf-profile.children.cycles-pp._raw_spin_lock
96.31 +0.3 96.59 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.19 ± 5% -0.1 0.12 ± 4% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.11 ± 4% -0.0 0.07 ± 7% perf-profile.self.cycles-pp.__alloc_file
0.21 ± 6% -0.0 0.18 ± 4% perf-profile.self.cycles-pp.file_free_rcu
0.08 ± 5% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.inode_permission
0.15 ± 3% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.kmem_cache_free
0.07 +0.0 0.09 perf-profile.self.cycles-pp.do_syscall_64
0.03 ±100% +0.0 0.06 ± 6% perf-profile.self.cycles-pp.run_timer_softirq
0.06 ± 6% +0.0 0.11 ± 8% perf-profile.self.cycles-pp.timerqueue_add
0.00 +0.1 0.05 ± 8% perf-profile.self.cycles-pp.__might_sleep
0.05 ± 8% +0.1 0.11 ± 7% perf-profile.self.cycles-pp.perf_event_task_tick
0.00 +0.1 0.06 ± 20% perf-profile.self.cycles-pp.locks_remove_posix
0.11 ± 4% +0.1 0.17 ± 17% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.00 +0.1 0.06 ± 17% perf-profile.self.cycles-pp.expand_files
0.08 ± 15% +0.1 0.15 ± 19% perf-profile.self.cycles-pp.rcu_irq_enter
396.75 ± 31% +53.1% 607.25 ± 12% interrupts.32:IR-PCI-MSI.2097155-edge.eth0-TxRx-2
6719 ± 24% -42.5% 3865 interrupts.CPU0.NMI:Non-maskable_interrupts
6719 ± 24% -42.5% 3865 interrupts.CPU0.PMI:Performance_monitoring_interrupts
1541 ± 14% -38.1% 954.75 ± 32% interrupts.CPU0.RES:Rescheduling_interrupts
85.00 ± 34% -74.4% 21.75 ± 54% interrupts.CPU104.RES:Rescheduling_interrupts
67.75 ±109% -80.8% 13.00 ± 14% interrupts.CPU105.RES:Rescheduling_interrupts
5717 ± 32% -33.3% 3814 interrupts.CPU106.NMI:Non-maskable_interrupts
5717 ± 32% -33.3% 3814 interrupts.CPU106.PMI:Performance_monitoring_interrupts
3828 +73.2% 6630 ± 24% interrupts.CPU109.NMI:Non-maskable_interrupts
3828 +73.2% 6630 ± 24% interrupts.CPU109.PMI:Performance_monitoring_interrupts
396.75 ± 31% +53.1% 607.25 ± 12% interrupts.CPU12.32:IR-PCI-MSI.2097155-edge.eth0-TxRx-2
5694 ± 32% -33.7% 3777 interrupts.CPU131.NMI:Non-maskable_interrupts
5694 ± 32% -33.7% 3777 interrupts.CPU131.PMI:Performance_monitoring_interrupts
4725 ± 34% +58.8% 7506 interrupts.CPU173.NMI:Non-maskable_interrupts
4725 ± 34% +58.8% 7506 interrupts.CPU173.PMI:Performance_monitoring_interrupts
6578 ± 24% -42.9% 3756 interrupts.CPU179.NMI:Non-maskable_interrupts
6578 ± 24% -42.9% 3756 interrupts.CPU179.PMI:Performance_monitoring_interrupts
5716 ± 33% -17.2% 4735 ± 35% interrupts.CPU183.NMI:Non-maskable_interrupts
5716 ± 33% -17.2% 4735 ± 35% interrupts.CPU183.PMI:Performance_monitoring_interrupts
4686 ± 34% +41.7% 6639 ± 24% interrupts.CPU187.NMI:Non-maskable_interrupts
4686 ± 34% +41.7% 6639 ± 24% interrupts.CPU187.PMI:Performance_monitoring_interrupts
63.00 ± 81% -66.3% 21.25 ± 91% interrupts.CPU187.RES:Rescheduling_interrupts
5799 ± 32% -17.4% 4787 ± 33% interrupts.CPU20.NMI:Non-maskable_interrupts
5799 ± 32% -17.4% 4787 ± 33% interrupts.CPU20.PMI:Performance_monitoring_interrupts
3802 +48.6% 5649 ± 31% interrupts.CPU207.NMI:Non-maskable_interrupts
3802 +48.6% 5649 ± 31% interrupts.CPU207.PMI:Performance_monitoring_interrupts
51.25 ± 96% -82.4% 9.00 ± 7% interrupts.CPU211.RES:Rescheduling_interrupts
5692 ± 30% -17.6% 4690 ± 32% interrupts.CPU215.NMI:Non-maskable_interrupts
5692 ± 30% -17.6% 4690 ± 32% interrupts.CPU215.PMI:Performance_monitoring_interrupts
70.00 ±116% -80.7% 13.50 ± 49% interrupts.CPU217.RES:Rescheduling_interrupts
7.25 ± 35% +903.4% 72.75 ± 85% interrupts.CPU222.RES:Rescheduling_interrupts
3786 +49.4% 5655 ± 32% interrupts.CPU224.NMI:Non-maskable_interrupts
3786 +49.4% 5655 ± 32% interrupts.CPU224.PMI:Performance_monitoring_interrupts
3778 +48.2% 5599 ± 31% interrupts.CPU227.NMI:Non-maskable_interrupts
3778 +48.2% 5599 ± 31% interrupts.CPU227.PMI:Performance_monitoring_interrupts
66.75 ± 97% -86.1% 9.25 ± 20% interrupts.CPU228.RES:Rescheduling_interrupts
4710 ± 33% +60.1% 7539 interrupts.CPU240.NMI:Non-maskable_interrupts
4710 ± 33% +60.1% 7539 interrupts.CPU240.PMI:Performance_monitoring_interrupts
18.75 ± 40% +618.7% 134.75 ± 69% interrupts.CPU240.RES:Rescheduling_interrupts
12.50 ± 30% +582.0% 85.25 ± 81% interrupts.CPU25.RES:Rescheduling_interrupts
4672 ± 33% +39.4% 6515 ± 23% interrupts.CPU284.NMI:Non-maskable_interrupts
4672 ± 33% +39.4% 6515 ± 23% interrupts.CPU284.PMI:Performance_monitoring_interrupts
48.75 ±123% -78.5% 10.50 ± 24% interrupts.CPU284.RES:Rescheduling_interrupts
6705 ± 24% -28.7% 4781 ± 34% interrupts.CPU5.NMI:Non-maskable_interrupts
6705 ± 24% -28.7% 4781 ± 34% interrupts.CPU5.PMI:Performance_monitoring_interrupts
3801 +73.9% 6612 ± 24% interrupts.CPU52.NMI:Non-maskable_interrupts
3801 +73.9% 6612 ± 24% interrupts.CPU52.PMI:Performance_monitoring_interrupts
4731 ± 34% +40.3% 6638 ± 24% interrupts.CPU53.NMI:Non-maskable_interrupts
4731 ± 34% +40.3% 6638 ± 24% interrupts.CPU53.PMI:Performance_monitoring_interrupts
3808 +49.4% 5689 ± 32% interrupts.CPU54.NMI:Non-maskable_interrupts
3808 +49.4% 5689 ± 32% interrupts.CPU54.PMI:Performance_monitoring_interrupts
5676 ± 32% -16.8% 4720 ± 34% interrupts.CPU59.NMI:Non-maskable_interrupts
5676 ± 32% -16.8% 4720 ± 34% interrupts.CPU59.PMI:Performance_monitoring_interrupts
3778 +50.0% 5666 ± 32% interrupts.CPU63.NMI:Non-maskable_interrupts
3778 +50.0% 5666 ± 32% interrupts.CPU63.PMI:Performance_monitoring_interrupts
5739 ± 32% -18.0% 4706 ± 33% interrupts.CPU72.NMI:Non-maskable_interrupts
5739 ± 32% -18.0% 4706 ± 33% interrupts.CPU72.PMI:Performance_monitoring_interrupts
4750 ± 33% +59.4% 7571 interrupts.CPU87.NMI:Non-maskable_interrupts
4750 ± 33% +59.4% 7571 interrupts.CPU87.PMI:Performance_monitoring_interrupts
5720 ± 33% -34.0% 3778 interrupts.CPU98.NMI:Non-maskable_interrupts
5720 ± 33% -34.0% 3778 interrupts.CPU98.PMI:Performance_monitoring_interrupts
will-it-scale.per_thread_ops
1040 +--------------------------------------------------------------------+
| +.. .+.|
1020 |-+ .+ + +.+ |
1000 |-+ .+. .+ + .+.. +.+ .+ .+..+.+. + |
|.+.+.+. + + + + .+.+..+ + .+.+.+ +. + |
980 |-+ + + + + |
960 |-+ |
| |
940 |-+ |
920 |-+ O O O O O O |
| O O O O O O O |
900 |-+ O O O O O O O O |
880 |-+ O O O O O |
| |
860 +--------------------------------------------------------------------+
will-it-scale.workload
300000 +------------------------------------------------------------------+
295000 |-+ .|
| + .+.+.+.+ |
290000 |-+ + : .+.+ + |
285000 |-+ .+.+.+..+ : .+. .+.+. .+.+. .+.+ + .. |
|.+.+ + + +.+.+. +.+.+ +.+ |
280000 |-+ |
275000 |-+ |
270000 |-+ |
| O O |
265000 |-+ O O O O O O |
260000 |-O O O O O O O O |
| O O O O O O O O |
255000 |-+ O O |
250000 +------------------------------------------------------------------+
will-it-scale.time.involuntary_context_switches
850000 +------------------------------------------------------------------+
| .+.. +.+ +. + +.+.|
800000 |-+. .+.+ +. + : : + +. .+.+ +. .+.+.+.+ + + + |
|+ + + : : + + +. + + + + + + |
750000 |-+ + + + +. .. |
| + |
700000 |-+ |
| |
650000 |-+ |
| |
600000 |-+ |
| |
550000 |-+ O O O O O O O |
| O O O O O O O O O O O O O O O O O O O |
500000 +------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-5.6.0-rc1-00126-g76550daa4d1eda" of type "text/plain" (203661 bytes)
View attachment "job-script" of type "text/plain" (7641 bytes)
View attachment "job.yaml" of type "text/plain" (5202 bytes)
View attachment "reproduce" of type "text/plain" (309 bytes)
Powered by blists - more mailing lists