[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201210081859.GA19784@xsang-OptiPlex-9020>
Date: Thu, 10 Dec 2020 16:18:59 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Valentin Schneider <valentin.schneider@....com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
feng.tang@...el.com, zhengjun.xing@...el.com,
aubrey.li@...ux.intel.com, yu.c.chen@...el.com
Subject: [sched/hotplug] 2558aacff8: will-it-scale.per_thread_ops -1.6%
regression
Greeting,
FYI, we noticed a -1.6% regression of will-it-scale.per_thread_ops due to commit:
commit: 2558aacff8586699bcd248b406febb28b0a25de2 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/migrate-disable
in testcase: will-it-scale
on test machine: 144 threads Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
with following parameters:
nr_task: 100%
mode: thread
test: sched_yield
cpufreq_governor: performance
ucode: 0x700001e
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-cpl-4sp1/sched_yield/will-it-scale/0x700001e
commit:
565790d28b ("sched: Fix balance_callback()")
2558aacff8 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
565790d28b1e33ee 2558aacff8586699bcd248b406f
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
1:4 0% 1:4 perf-profile.children.cycles-pp.error_entry
0:4 0% 0:4 perf-profile.self.cycles-pp.error_entry
%stddev %change %stddev
\ | \
4.011e+08 -1.6% 3.945e+08 will-it-scale.144.threads
2785455 -1.6% 2739520 will-it-scale.per_thread_ops
4.011e+08 -1.6% 3.945e+08 will-it-scale.workload
12.05 +2.1 14.18 mpstat.cpu.all.usr%
1087711 ± 75% -79.0% 228885 ± 7% numa-numastat.node1.local_node
1126029 ± 74% -74.5% 286894 ± 6% numa-numastat.node1.numa_hit
33836 -2.3% 33042 proc-vmstat.nr_slab_reclaimable
74433 -1.5% 73345 proc-vmstat.nr_slab_unreclaimable
86.25 -2.3% 84.25 vmstat.cpu.sy
11.75 ± 3% +17.0% 13.75 ± 3% vmstat.cpu.us
333551 ± 17% -21.7% 261115 ± 5% vmstat.system.cs
329071 ± 3% -15.4% 278535 ± 4% sched_debug.cfs_rq:/.spread0.avg
472614 ± 2% -11.0% 420678 ± 2% sched_debug.cfs_rq:/.spread0.max
17597663 ± 17% -28.5% 12582107 ± 16% sched_debug.cpu.nr_switches.max
1897476 ± 17% -28.4% 1359264 ± 14% sched_debug.cpu.nr_switches.stddev
5628 ± 8% -10.9% 5012 ± 3% slabinfo.files_cache.active_objs
5628 ± 8% -10.9% 5012 ± 3% slabinfo.files_cache.num_objs
3613 ± 2% -10.9% 3219 slabinfo.kmalloc-rcl-512.active_objs
3644 ± 2% -10.9% 3248 slabinfo.kmalloc-rcl-512.num_objs
3967 ± 4% -8.3% 3638 ± 2% slabinfo.sock_inode_cache.active_objs
3967 ± 4% -8.3% 3638 ± 2% slabinfo.sock_inode_cache.num_objs
0.02 ± 9% -14.5% 0.02 ± 2% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
14.28 ± 38% +48.9% 21.26 ± 24% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
0.02 ± 24% +38.2% 0.03 ± 14% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
0.04 ± 13% -22.8% 0.03 ± 14% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
47.71 ± 30% +41.6% 67.54 ± 11% perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
3.20 ± 33% -81.5% 0.59 ± 91% perf-sched.wait_time.avg.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
33.43 ± 27% +38.4% 46.27 ± 8% perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
0.05 ± 43% -68.9% 0.02 ± 63% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.path_openat
8.23 ± 10% -57.9% 3.47 ± 97% perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
35211 ±169% -99.6% 129.50 ± 96% numa-vmstat.node1.nr_active_anon
8060 ± 17% -35.0% 5236 ± 31% numa-vmstat.node1.nr_slab_reclaimable
35211 ±169% -99.6% 129.50 ± 96% numa-vmstat.node1.nr_zone_active_anon
1053733 ± 53% -52.7% 498416 ± 7% numa-vmstat.node1.numa_hit
946160 ± 58% -62.7% 352475 ± 12% numa-vmstat.node1.numa_local
107572 ± 23% +35.7% 145940 ± 5% numa-vmstat.node1.numa_other
5914 ± 23% -28.5% 4226 ± 10% numa-vmstat.node2.nr_slab_reclaimable
18204 ± 3% -14.7% 15522 ± 12% numa-vmstat.node2.nr_slab_unreclaimable
629428 ± 9% -14.2% 540085 ± 7% numa-vmstat.node2.numa_hit
17302 ± 10% +26.0% 21807 ± 7% numa-vmstat.node3.nr_slab_unreclaimable
140785 ±169% -99.6% 520.75 ± 95% numa-meminfo.node1.Active
140785 ±169% -99.6% 520.75 ± 95% numa-meminfo.node1.Active(anon)
32241 ± 17% -35.0% 20948 ± 31% numa-meminfo.node1.KReclaimable
32241 ± 17% -35.0% 20948 ± 31% numa-meminfo.node1.SReclaimable
101007 ± 5% -15.7% 85162 ± 13% numa-meminfo.node1.Slab
23657 ± 23% -28.5% 16906 ± 10% numa-meminfo.node2.KReclaimable
23657 ± 23% -28.5% 16906 ± 10% numa-meminfo.node2.SReclaimable
72823 ± 3% -14.7% 62089 ± 12% numa-meminfo.node2.SUnreclaim
96481 ± 3% -18.1% 78996 ± 12% numa-meminfo.node2.Slab
69210 ± 10% +26.0% 87229 ± 7% numa-meminfo.node3.SUnreclaim
110579 ± 9% +26.7% 140158 ± 10% numa-meminfo.node3.Slab
388.75 ± 74% +1147.8% 4851 ±124% interrupts.33:PCI-MSI.524291-edge.eth0-TxRx-2
1540 ± 69% -78.2% 335.75 ± 51% interrupts.34:PCI-MSI.524292-edge.eth0-TxRx-3
388.75 ± 74% +1147.8% 4851 ±124% interrupts.CPU11.33:PCI-MSI.524291-edge.eth0-TxRx-2
307.50 +66.7% 512.50 ± 50% interrupts.CPU111.RES:Rescheduling_interrupts
1540 ± 69% -78.2% 335.75 ± 51% interrupts.CPU12.34:PCI-MSI.524292-edge.eth0-TxRx-3
350.50 ± 8% -10.3% 314.50 ± 2% interrupts.CPU122.RES:Rescheduling_interrupts
424.50 ± 24% -18.7% 345.00 ± 14% interrupts.CPU128.RES:Rescheduling_interrupts
8496 -50.1% 4241 interrupts.CPU29.NMI:Non-maskable_interrupts
8496 -50.1% 4241 interrupts.CPU29.PMI:Performance_monitoring_interrupts
314.25 +8.7% 341.50 ± 4% interrupts.CPU29.RES:Rescheduling_interrupts
8496 -50.1% 4242 interrupts.CPU30.NMI:Non-maskable_interrupts
8496 -50.1% 4242 interrupts.CPU30.PMI:Performance_monitoring_interrupts
311.50 +13.2% 352.75 ± 8% interrupts.CPU7.RES:Rescheduling_interrupts
21144 ± 15% -25.0% 15858 ± 24% interrupts.CPU72.CAL:Function_call_interrupts
317.75 +39.2% 442.25 ± 32% interrupts.CPU82.RES:Rescheduling_interrupts
8.557e+10 -1.8% 8.399e+10 perf-stat.i.branch-instructions
0.43 +0.4 0.87 perf-stat.i.branch-miss-rate%
3.479e+08 +106.5% 7.186e+08 perf-stat.i.branch-misses
333383 ± 17% -22.1% 259849 ± 6% perf-stat.i.context-switches
1.02 +2.4% 1.05 perf-stat.i.cpi
1.268e+11 -1.9% 1.243e+11 perf-stat.i.dTLB-loads
7.506e+10 -1.9% 7.363e+10 perf-stat.i.dTLB-stores
4.26e+08 ± 2% -31.3% 2.925e+08 perf-stat.i.iTLB-load-misses
538538 ± 36% -79.5% 110207 ± 16% perf-stat.i.iTLB-loads
3.983e+11 -1.9% 3.908e+11 perf-stat.i.instructions
946.16 ± 3% +43.4% 1356 perf-stat.i.instructions-per-iTLB-miss
0.99 -2.4% 0.97 perf-stat.i.ipc
1.22 ± 3% +30.8% 1.60 ± 3% perf-stat.i.metric.K/sec
1996 -1.9% 1958 perf-stat.i.metric.M/sec
0.41 +0.4 0.86 perf-stat.overall.branch-miss-rate%
1.01 +2.5% 1.03 perf-stat.overall.cpi
935.95 ± 2% +42.8% 1336 perf-stat.overall.instructions-per-iTLB-miss
0.99 -2.4% 0.97 perf-stat.overall.ipc
8.527e+10 -1.8% 8.37e+10 perf-stat.ps.branch-instructions
3.467e+08 +106.5% 7.161e+08 perf-stat.ps.branch-misses
334637 ± 17% -21.5% 262794 ± 5% perf-stat.ps.context-switches
1.264e+11 -1.9% 1.239e+11 perf-stat.ps.dTLB-loads
7.48e+10 -1.9% 7.338e+10 perf-stat.ps.dTLB-stores
4.244e+08 ± 2% -31.3% 2.915e+08 perf-stat.ps.iTLB-load-misses
539519 ± 36% -79.5% 110644 ± 16% perf-stat.ps.iTLB-loads
3.969e+11 -1.9% 3.895e+11 perf-stat.ps.instructions
1.2e+14 -2.0% 1.176e+14 perf-stat.total.instructions
0.68 ± 2% -0.1 0.59 ± 3% perf-profile.calltrace.cycles-pp.orc_find.unwind_next_frame.perf_callchain_kernel.get_perf_callchain.perf_callchain
0.93 -0.1 0.88 ± 3% perf-profile.calltrace.cycles-pp.__perf_event_header__init_id.perf_prepare_sample.perf_event_output_forward.__perf_event_overflow.perf_swevent_overflow
1.22 +0.1 1.30 ± 2% perf-profile.calltrace.cycles-pp.__orc_find.unwind_next_frame.perf_callchain_kernel.get_perf_callchain.perf_callchain
1.11 +0.1 1.21 ± 2% perf-profile.calltrace.cycles-pp.orc_find.unwind_next_frame.__unwind_start.perf_callchain_kernel.get_perf_callchain
1.51 -0.1 1.44 ± 3% perf-profile.children.cycles-pp.stack_access_ok
0.37 ± 3% -0.1 0.30 perf-profile.children.cycles-pp.__task_pid_nr_ns
0.47 ± 2% -0.1 0.41 ± 2% perf-profile.children.cycles-pp.perf_event_pid_type
0.30 ± 5% -0.0 0.25 ± 12% perf-profile.children.cycles-pp.__list_del_entry_valid
0.95 -0.0 0.90 ± 3% perf-profile.children.cycles-pp.__perf_event_header__init_id
0.10 ± 14% -0.0 0.07 ± 31% perf-profile.children.cycles-pp.sched_yield@plt
0.42 -0.0 0.38 ± 5% perf-profile.children.cycles-pp.ftrace_graph_ret_addr
0.10 ± 4% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.is_module_text_address
0.11 ± 4% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.is_ftrace_trampoline
0.10 ± 4% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.ftrace_ops_trampoline
0.06 ± 15% +0.0 0.09 ± 7% perf-profile.children.cycles-pp.rcu_qs
0.23 ± 8% +0.1 0.31 ± 7% perf-profile.children.cycles-pp.rcu_note_context_switch
0.35 ± 4% -0.1 0.29 perf-profile.self.cycles-pp.__task_pid_nr_ns
1.24 -0.1 1.18 ± 2% perf-profile.self.cycles-pp.stack_access_ok
0.23 ± 5% -0.0 0.18 ± 12% perf-profile.self.cycles-pp.__list_del_entry_valid
0.34 ± 4% -0.0 0.30 ± 2% perf-profile.self.cycles-pp.perf_tp_event
0.12 ± 4% -0.0 0.10 ± 10% perf-profile.self.cycles-pp.sched_clock_cpu
0.32 -0.0 0.30 ± 5% perf-profile.self.cycles-pp.ftrace_graph_ret_addr
0.08 -0.0 0.06 ± 6% perf-profile.self.cycles-pp.ftrace_ops_trampoline
0.34 -0.0 0.33 ± 2% perf-profile.self.cycles-pp.unwind_get_return_address
0.08 +0.0 0.10 ± 7% perf-profile.self.cycles-pp.rcu_is_watching
0.05 ± 8% +0.0 0.09 ± 7% perf-profile.self.cycles-pp.rcu_qs
0.51 +0.0 0.55 ± 3% perf-profile.self.cycles-pp.bsearch
0.16 ± 13% +0.0 0.20 ± 7% perf-profile.self.cycles-pp.rcu_note_context_switch
1.44 ± 4% +0.3 1.76 ± 12% perf-profile.self.cycles-pp.__sched_yield
will-it-scale.per_thread_ops
3e+06 +-----------------------------------------------------------------+
| : O +.+.+.+.+.++.+.+.+.+.+ O O O O O O O O O O OO O O O O O O O |
2.5e+06 |-: : |
| : : |
| : : |
2e+06 |-+: : |
| : : |
1.5e+06 |-+: : |
| : : |
1e+06 |-+: : |
| : : |
| : |
500000 |-+ : |
| : |
0 +-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Oliver Sang
View attachment "config-5.10.0-rc1-00033-g2558aacff858" of type "text/plain" (171488 bytes)
View attachment "job-script" of type "text/plain" (7947 bytes)
View attachment "job.yaml" of type "text/plain" (5522 bytes)
View attachment "reproduce" of type "text/plain" (343 bytes)
Powered by blists - more mailing lists