[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202409231416.9403c2e9-oliver.sang@intel.com>
Date: Mon, 23 Sep 2024 15:01:58 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
Chunxin Zang <zangchunxin@...iang.com>, Valentin Schneider
<vschneid@...hat.com>, Mike Galbraith <umgwanakikbuti@...il.com>,
<ying.huang@...el.com>, <feng.tang@...el.com>, <fengwei.yin@...el.com>,
<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, <oliver.sang@...el.com>
Subject: [linus:master] [sched/eevdf] 85e511df3c: hackbench.throughput
-13.1% regression
Hello,
FYI. Chenyu (Cced) will post a trial patch soon for below report.
kernel test robot noticed a -13.1% regression of hackbench.throughput on:
commit: 85e511df3cec46021024176672a748008ed135bf ("sched/eevdf: Allow shorter slices to wakeup-preempt")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: hackbench
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:
nr_threads: 50%
iterations: 4
mode: process
ipc: socket
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202409231416.9403c2e9-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240923/202409231416.9403c2e9-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
gcc-12/performance/socket/4/x86_64-rhel-8.3/process/50%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/hackbench
commit:
82e9d0456e ("sched/fair: Avoid re-setting virtual deadline on 'migrations'")
85e511df3c ("sched/eevdf: Allow shorter slices to wakeup-preempt")
82e9d0456e06cebe 85e511df3cec46021024176672a
---------------- ---------------------------
%stddev %change %stddev
\ | \
217.40 +13.5% 246.74 uptime.boot
5391461 ± 19% +16.5% 6281524 ± 6% numa-meminfo.node0.MemUsed
352581 ± 13% +24.6% 439472 ± 16% numa-meminfo.node0.SUnreclaim
4679401 -15.8% 3938145 vmstat.system.cs
854648 -15.2% 724774 vmstat.system.in
0.46 ± 2% -0.1 0.40 mpstat.cpu.all.irq%
0.03 ± 3% -0.0 0.03 mpstat.cpu.all.soft%
3.35 -0.6 2.75 mpstat.cpu.all.usr%
44542 +2.7% 45755 proc-vmstat.nr_slab_reclaimable
642130 ± 68% -71.2% 184909 ± 11% proc-vmstat.pgactivate
2170433 ± 2% +6.8% 2318318 ± 2% proc-vmstat.pgfault
138302 ± 4% +6.7% 147631 ± 3% proc-vmstat.pgreuse
623219 -13.1% 541887 hackbench.throughput
606251 -14.1% 520789 hackbench.throughput_avg
623219 -13.1% 541887 hackbench.throughput_best
580034 -14.8% 494354 hackbench.throughput_worst
174.58 +16.3% 203.09 hackbench.time.elapsed_time
174.58 +16.3% 203.09 hackbench.time.elapsed_time.max
1.654e+08 +2.2% 1.69e+08 hackbench.time.involuntary_context_switches
36869 +17.6% 43340 hackbench.time.system_time
1172 -5.5% 1107 hackbench.time.user_time
6.478e+08 -3.5% 6.255e+08 hackbench.time.voluntary_context_switches
6.354e+10 -11.4% 5.63e+10 perf-stat.i.branch-instructions
3.226e+08 -12.5% 2.822e+08 perf-stat.i.branch-misses
94557935 ± 3% -15.2% 80197744 ± 2% perf-stat.i.cache-misses
2.563e+09 -13.7% 2.212e+09 perf-stat.i.cache-references
4710895 -15.9% 3959720 perf-stat.i.context-switches
1.86 +14.3% 2.13 perf-stat.i.cpi
601598 -15.0% 511540 perf-stat.i.cpu-migrations
7390 ± 5% +28.4% 9492 ± 2% perf-stat.i.cycles-between-cache-misses
3.408e+11 -12.1% 2.997e+11 perf-stat.i.instructions
0.54 -12.2% 0.47 perf-stat.i.ipc
23.73 -15.9% 19.95 perf-stat.i.metric.K/sec
1.66 ± 35% +28.5% 2.13 perf-stat.overall.cpi
6006 ± 35% +33.5% 8020 ± 2% perf-stat.overall.cycles-between-cache-misses
5.287e+13 ± 35% +15.1% 6.083e+13 perf-stat.total.instructions
13829361 +51.8% 20989754 sched_debug.cfs_rq:/.avg_vruntime.avg
18756074 ± 5% +44.2% 27055241 ± 3% sched_debug.cfs_rq:/.avg_vruntime.max
12499623 ± 2% +52.4% 19043277 ± 2% sched_debug.cfs_rq:/.avg_vruntime.min
8.93 ± 2% +14.1% 10.19 sched_debug.cfs_rq:/.h_nr_running.avg
4.68 ± 3% +10.0% 5.15 ± 2% sched_debug.cfs_rq:/.h_nr_running.stddev
0.44 ± 35% +75.8% 0.78 ± 19% sched_debug.cfs_rq:/.load_avg.min
13829361 +51.8% 20989754 sched_debug.cfs_rq:/.min_vruntime.avg
18756074 ± 5% +44.2% 27055241 ± 3% sched_debug.cfs_rq:/.min_vruntime.max
12499623 ± 2% +52.4% 19043277 ± 2% sched_debug.cfs_rq:/.min_vruntime.min
0.68 +11.7% 0.76 sched_debug.cfs_rq:/.nr_running.avg
176.30 ± 3% -22.8% 136.16 ± 4% sched_debug.cfs_rq:/.removed.runnable_avg.max
176.30 ± 3% -22.8% 136.16 ± 4% sched_debug.cfs_rq:/.removed.util_avg.max
8995 +16.0% 10437 sched_debug.cfs_rq:/.runnable_avg.avg
18978 ± 6% +13.7% 21579 ± 6% sched_debug.cfs_rq:/.runnable_avg.max
2890 ± 4% +13.9% 3292 ± 3% sched_debug.cfs_rq:/.runnable_avg.stddev
415209 ± 22% -23.3% 318311 ± 3% sched_debug.cpu.avg_idle.avg
102333 ± 2% +30.5% 133496 ± 2% sched_debug.cpu.clock.avg
102519 ± 2% +30.4% 133722 ± 2% sched_debug.cpu.clock.max
102127 ± 2% +30.5% 133254 ± 2% sched_debug.cpu.clock.min
101839 ± 2% +30.5% 132880 ± 2% sched_debug.cpu.clock_task.avg
102169 ± 2% +30.4% 133268 ± 2% sched_debug.cpu.clock_task.max
87129 ± 2% +35.6% 118117 ± 2% sched_debug.cpu.clock_task.min
11573 +32.4% 15327 sched_debug.cpu.curr->pid.avg
14704 +23.9% 18214 sched_debug.cpu.curr->pid.max
1516 ± 9% +16.4% 1765 ± 10% sched_debug.cpu.curr->pid.stddev
8.92 ± 2% +14.1% 10.18 sched_debug.cpu.nr_running.avg
4.69 ± 2% +10.0% 5.16 ± 2% sched_debug.cpu.nr_running.stddev
1232815 ± 2% +27.3% 1569099 sched_debug.cpu.nr_switches.avg
1411362 ± 5% +26.8% 1789325 ± 3% sched_debug.cpu.nr_switches.max
1045767 ± 2% +27.3% 1331341 ± 3% sched_debug.cpu.nr_switches.min
102127 ± 2% +30.5% 133250 ± 2% sched_debug.cpu_clk
101071 ± 2% +30.8% 132194 ± 2% sched_debug.ktime
0.00 -25.0% 0.00 sched_debug.rt_rq:.rt_nr_running.avg
0.33 -25.0% 0.25 sched_debug.rt_rq:.rt_nr_running.max
0.02 -25.0% 0.02 sched_debug.rt_rq:.rt_nr_running.stddev
102997 ± 2% +30.2% 134142 ± 2% sched_debug.sched_clk
16347631 +100.0% 32695263 sched_debug.sysctl_sched.sysctl_sched_features
1.60 ± 2% -0.1 1.45 ± 5% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.read
1.50 ± 2% -0.1 1.36 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
0.62 ± 2% -0.1 0.50 ± 38% perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
0.78 ± 2% -0.1 0.71 ± 7% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
39.00 +0.5 39.50 perf-profile.calltrace.cycles-pp.sock_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
37.56 +0.6 38.15 perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write.do_syscall_64
1.77 ± 2% -0.2 1.61 ± 5% perf-profile.children.cycles-pp.entry_SYSCALL_64
1.87 ± 2% -0.2 1.72 ± 5% perf-profile.children.cycles-pp.mod_objcg_state
0.90 ± 2% -0.1 0.83 ± 5% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.85 ± 2% -0.1 0.78 ± 5% perf-profile.children.cycles-pp.obj_cgroup_charge
0.10 ± 4% -0.1 0.04 ±104% perf-profile.children.cycles-pp.__handle_mm_fault
0.10 ± 4% -0.1 0.04 ±104% perf-profile.children.cycles-pp.handle_mm_fault
0.10 ± 4% -0.1 0.04 ±102% perf-profile.children.cycles-pp.do_user_addr_fault
0.10 ± 4% -0.1 0.04 ±102% perf-profile.children.cycles-pp.exc_page_fault
0.63 ± 2% -0.1 0.57 ± 5% perf-profile.children.cycles-pp.__cond_resched
0.10 ± 4% -0.0 0.05 ± 63% perf-profile.children.cycles-pp.asm_exc_page_fault
0.32 ± 2% -0.0 0.28 ± 3% perf-profile.children.cycles-pp.task_mm_cid_work
0.32 ± 2% -0.0 0.28 ± 3% perf-profile.children.cycles-pp.task_work_run
0.34 ± 2% -0.0 0.30 ± 5% perf-profile.children.cycles-pp.rcu_all_qs
0.32 ± 3% -0.0 0.29 ± 6% perf-profile.children.cycles-pp.__virt_addr_valid
0.18 ± 4% -0.0 0.16 ± 7% perf-profile.children.cycles-pp.__enqueue_entity
0.23 ± 3% -0.0 0.21 ± 6% perf-profile.children.cycles-pp.set_next_entity
0.14 ± 4% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.__dequeue_entity
0.06 -0.0 0.05 perf-profile.children.cycles-pp.cpuacct_charge
0.09 ± 8% +0.0 0.12 ± 7% perf-profile.children.cycles-pp.generic_perform_write
0.07 ± 10% +0.0 0.10 ± 11% perf-profile.children.cycles-pp.sched_balance_find_src_group
0.06 ± 10% +0.0 0.09 ± 11% perf-profile.children.cycles-pp.update_sg_lb_stats
0.07 ± 10% +0.0 0.10 ± 11% perf-profile.children.cycles-pp.update_sd_lb_stats
0.13 ± 8% +0.0 0.16 ± 8% perf-profile.children.cycles-pp.writen
0.02 ±111% +0.0 0.06 ± 13% perf-profile.children.cycles-pp.set_task_cpu
0.02 ±141% +0.0 0.06 ± 11% perf-profile.children.cycles-pp.ring_buffer_read_head
0.00 +0.1 0.06 ± 8% perf-profile.children.cycles-pp.vruntime_eligible
0.22 ± 8% +0.1 0.28 ± 9% perf-profile.children.cycles-pp.perf_mmap__push
0.23 ± 8% +0.1 0.29 ± 9% perf-profile.children.cycles-pp.record__mmap_read_evlist
0.23 ± 7% +0.1 0.29 ± 9% perf-profile.children.cycles-pp.cmd_record
0.23 ± 7% +0.1 0.30 ± 9% perf-profile.children.cycles-pp.handle_internal_command
0.23 ± 7% +0.1 0.30 ± 9% perf-profile.children.cycles-pp.main
0.23 ± 7% +0.1 0.30 ± 9% perf-profile.children.cycles-pp.run_builtin
0.00 +0.1 0.07 ± 19% perf-profile.children.cycles-pp.schedule_idle
0.00 +0.1 0.12 ± 19% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.00 +0.1 0.14 ± 20% perf-profile.children.cycles-pp.sched_ttwu_pending
0.00 +0.2 0.15 ± 17% perf-profile.children.cycles-pp.intel_idle
0.00 +0.2 0.17 ± 18% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.01 ±282% +0.2 0.18 ± 70% perf-profile.children.cycles-pp.available_idle_cpu
0.23 ± 3% +0.2 0.43 ± 16% perf-profile.children.cycles-pp.prepare_to_wait
0.00 +0.2 0.20 ± 18% perf-profile.children.cycles-pp.cpuidle_enter
0.00 +0.2 0.20 ± 18% perf-profile.children.cycles-pp.cpuidle_enter_state
0.01 ±282% +0.2 0.22 ± 16% perf-profile.children.cycles-pp.cpuidle_idle_call
0.00 +0.3 0.35 ±110% perf-profile.children.cycles-pp.select_idle_cpu
0.06 ± 6% +0.4 0.41 ± 95% perf-profile.children.cycles-pp.select_idle_sibling
0.22 ± 2% +0.4 0.57 ± 70% perf-profile.children.cycles-pp.select_task_rq_fair
0.26 ± 5% +0.4 0.62 ± 65% perf-profile.children.cycles-pp.select_task_rq
0.04 ± 77% +0.4 0.43 ± 18% perf-profile.children.cycles-pp.start_secondary
0.04 ± 77% +0.4 0.43 ± 18% perf-profile.children.cycles-pp.do_idle
0.04 ± 77% +0.4 0.43 ± 17% perf-profile.children.cycles-pp.common_startup_64
0.04 ± 77% +0.4 0.43 ± 17% perf-profile.children.cycles-pp.cpu_startup_entry
40.28 +0.4 40.71 perf-profile.children.cycles-pp.vfs_write
39.04 +0.5 39.54 perf-profile.children.cycles-pp.sock_write_iter
37.73 +0.6 38.31 perf-profile.children.cycles-pp.unix_stream_sendmsg
1.50 ± 2% -0.1 1.37 ± 5% perf-profile.self.cycles-pp.mod_objcg_state
1.41 ± 2% -0.1 1.30 ± 5% perf-profile.self.cycles-pp.kmem_cache_free
0.83 ± 3% -0.1 0.74 ± 5% perf-profile.self.cycles-pp.read
0.88 ± 2% -0.1 0.80 ± 5% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.74 ± 2% -0.1 0.67 ± 5% perf-profile.self.cycles-pp.write
0.70 ± 2% -0.1 0.64 ± 6% perf-profile.self.cycles-pp.vfs_read
0.67 ± 2% -0.1 0.61 ± 5% perf-profile.self.cycles-pp.vfs_write
0.67 ± 2% -0.1 0.62 ± 4% perf-profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
0.52 ± 3% -0.1 0.47 ± 4% perf-profile.self.cycles-pp.obj_cgroup_charge
0.51 ± 2% -0.0 0.46 ± 5% perf-profile.self.cycles-pp.kmem_cache_alloc_node_noprof
0.29 ± 2% -0.0 0.25 ± 3% perf-profile.self.cycles-pp.task_mm_cid_work
0.43 ± 2% -0.0 0.40 ± 4% perf-profile.self.cycles-pp.do_syscall_64
0.34 ± 3% -0.0 0.31 ± 6% perf-profile.self.cycles-pp.__skb_datagram_iter
0.29 ± 2% -0.0 0.26 ± 6% perf-profile.self.cycles-pp.__virt_addr_valid
0.33 ± 2% -0.0 0.30 ± 6% perf-profile.self.cycles-pp.__cond_resched
0.28 ± 3% -0.0 0.25 ± 5% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.37 ± 2% -0.0 0.34 ± 6% perf-profile.self.cycles-pp.__check_object_size
0.18 ± 5% -0.0 0.15 ± 6% perf-profile.self.cycles-pp.__enqueue_entity
0.21 ± 3% -0.0 0.18 ± 6% perf-profile.self.cycles-pp.rcu_all_qs
0.22 ± 3% -0.0 0.20 ± 6% perf-profile.self.cycles-pp.x64_sys_call
0.19 ± 2% -0.0 0.17 ± 6% perf-profile.self.cycles-pp.rw_verify_area
0.05 +0.0 0.08 ± 23% perf-profile.self.cycles-pp.update_rq_clock
0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.ring_buffer_read_head
0.00 +0.2 0.15 ± 17% perf-profile.self.cycles-pp.intel_idle
0.01 ±282% +0.2 0.18 ± 70% perf-profile.self.cycles-pp.available_idle_cpu
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists