[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <202510281543.28d76c2-lkp@intel.com>
Date: Tue, 28 Oct 2025 15:29:26 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Shubhang Kaushik via B4 Relay
<devnull+shubhang.os.amperecomputing.com@...nel.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, Ingo Molnar
<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Shubhang Kaushik
<sh@...two.org>, Shijie Huang <Shijie.Huang@...erecomputing.com>, Frank Wang
<zwang@...erecomputing.com>, Christopher Lameter <cl@...two.org>, Adam Li
<adam.li@...erecomputing.com>, Shubhang Kaushik
<shubhang@...amperecomputing.com>, <oliver.sang@...el.com>
Subject: Re: [PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup
Hello,
kernel test robot noticed a 76.8% improvement of stress-ng.tee.ops_per_sec on:
commit: 24efd1bf8a44f0f51f42f4af4ce22f21e873073d ("[PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup")
url: https://github.com/intel-lab-lkp/linux/commits/Shubhang-Kaushik-via-B4-Relay/sched-fair-Prefer-cache-hot-prev_cpu-for-wakeup/20251018-092110
patch link: https://lore.kernel.org/all/20251017-b4-sched-cfs-refactor-propagate-v1-1-1eb0dc5b19b3@os.amperecomputing.com/
patch subject: [PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
parameters:
nr_threads: 100%
testtime: 60s
test: tee
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251028/202510281543.28d76c2-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-spr-2sp4/tee/stress-ng/60s
commit:
9b332cece9 ("Merge tag 'nfsd-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux")
24efd1bf8a ("sched/fair: Prefer cache-hot prev_cpu for wakeup")
9b332cece987ee17 24efd1bf8a44f0f51f42f4af4ce
---------------- ---------------------------
%stddev %change %stddev
\ | \
12097 ± 3% +10.9% 13413 ± 2% uptime.idle
3.662e+08 ± 7% +382.7% 1.768e+09 cpuidle..time
5056131 ± 56% +426.8% 26635997 ± 3% cpuidle..usage
13144587 ± 11% +21.1% 15921410 meminfo.Memused
13326158 ± 11% +20.6% 16067699 meminfo.max_used_kB
58707455 -16.5% 49043102 ± 9% numa-numastat.node1.local_node
58841583 -16.4% 49176968 ± 9% numa-numastat.node1.numa_hit
58770618 -16.3% 49175467 ± 9% numa-vmstat.node1.numa_hit
58636509 -16.4% 49041602 ± 9% numa-vmstat.node1.numa_local
2184 ± 9% +2157.3% 49310 ± 3% perf-c2c.DRAM.remote
3115 ± 11% +1689.3% 55737 ± 3% perf-c2c.HITM.local
1193 ± 13% +2628.6% 32575 ± 3% perf-c2c.HITM.remote
4308 ± 10% +1949.6% 88312 perf-c2c.HITM.total
1.95 ± 6% +10.4 12.34 mpstat.cpu.all.idle%
0.50 ± 3% +1.0 1.53 mpstat.cpu.all.irq%
0.02 ± 6% +0.1 0.09 ± 5% mpstat.cpu.all.soft%
74.24 -7.0 67.21 mpstat.cpu.all.sys%
23.29 -4.5 18.83 mpstat.cpu.all.usr%
232818 ± 35% -18.3% 190138 proc-vmstat.nr_anon_pages
124104 -1.1% 122691 proc-vmstat.nr_slab_unreclaimable
1.167e+08 -15.0% 99106005 proc-vmstat.numa_hit
1.164e+08 -15.1% 98853060 proc-vmstat.numa_local
1.168e+08 -15.2% 99060661 proc-vmstat.pgalloc_normal
1.147e+08 -15.7% 96704739 proc-vmstat.pgfree
1.071e+08 ± 2% +76.8% 1.894e+08 ± 2% stress-ng.tee.ops
1786177 ± 2% +76.8% 3157701 ± 2% stress-ng.tee.ops_per_sec
1.044e+08 -49.4% 52882701 stress-ng.time.involuntary_context_switches
21972 -12.1% 19317 stress-ng.time.percent_of_cpu_this_job_got
10131 -9.6% 9155 stress-ng.time.system_time
3070 -20.2% 2450 stress-ng.time.user_time
1.512e+08 -37.9% 93853736 stress-ng.time.voluntary_context_switches
2816 -10.5% 2519 turbostat.Avg_MHz
97.12 -9.8 87.30 turbostat.Busy%
0.11 ± 52% +0.5 0.66 ± 5% turbostat.C1%
0.40 ± 11% +8.4 8.78 turbostat.C1E%
2.39 ± 3% +1.0 3.42 ± 2% turbostat.C6%
1.08 ± 9% +168.3% 2.90 ± 3% turbostat.CPU%c1
32638444 +167.8% 87395049 turbostat.IRQ
110.56 +14.6 125.14 ± 4% turbostat.PKG_%
23.05 +32.8% 30.62 turbostat.RAMWatt
7559994 -21.3% 5948968 sched_debug.cfs_rq:/.avg_vruntime.avg
11028968 ± 13% -38.2% 6818572 ± 4% sched_debug.cfs_rq:/.avg_vruntime.max
0.34 ± 13% +104.0% 0.69 ± 3% sched_debug.cfs_rq:/.h_nr_queued.stddev
0.38 ± 8% +75.2% 0.67 ± 3% sched_debug.cfs_rq:/.h_nr_runnable.stddev
20.67 ± 33% +3672.8% 779.66 ± 73% sched_debug.cfs_rq:/.load_avg.avg
519.67 +7141.5% 37631 ± 10% sched_debug.cfs_rq:/.load_avg.max
86.71 ± 22% +5134.0% 4538 ± 39% sched_debug.cfs_rq:/.load_avg.stddev
7559994 -21.3% 5948968 sched_debug.cfs_rq:/.min_vruntime.avg
11028968 ± 13% -38.2% 6818572 ± 4% sched_debug.cfs_rq:/.min_vruntime.max
0.12 ± 17% +117.1% 0.27 ± 3% sched_debug.cfs_rq:/.nr_queued.stddev
809.69 ± 2% +15.6% 936.26 sched_debug.cfs_rq:/.runnable_avg.avg
2093 ± 3% +18.5% 2480 ± 8% sched_debug.cfs_rq:/.runnable_avg.max
259.47 ± 18% +71.8% 445.79 ± 3% sched_debug.cfs_rq:/.runnable_avg.stddev
576.64 -10.6% 515.40 sched_debug.cfs_rq:/.util_avg.avg
137.33 ± 12% +85.3% 254.45 ± 2% sched_debug.cfs_rq:/.util_avg.stddev
609.44 +15.6% 704.34 ± 3% sched_debug.cfs_rq:/.util_est.avg
1839 ± 11% +23.7% 2274 ± 7% sched_debug.cfs_rq:/.util_est.max
245.27 ± 7% +82.4% 447.29 ± 4% sched_debug.cfs_rq:/.util_est.stddev
702831 ± 5% +19.6% 840863 ± 3% sched_debug.cpu.avg_idle.avg
378668 ± 14% +32.2% 500458 ± 6% sched_debug.cpu.avg_idle.stddev
44.33 ± 22% -62.7% 16.52 ± 13% sched_debug.cpu.clock.stddev
909.29 ± 12% +160.6% 2369 ± 4% sched_debug.cpu.curr->pid.stddev
639355 ± 5% +80.6% 1154626 sched_debug.cpu.max_idle_balance_cost.avg
500000 +57.3% 786555 ± 11% sched_debug.cpu.max_idle_balance_cost.min
0.00 ± 20% -47.2% 0.00 ± 18% sched_debug.cpu.next_balance.stddev
0.32 ± 14% +111.2% 0.68 ± 3% sched_debug.cpu.nr_running.stddev
574871 -33.2% 383811 sched_debug.cpu.nr_switches.avg
788985 ± 11% -32.4% 533309 ± 6% sched_debug.cpu.nr_switches.max
0.04 ± 19% +1073.1% 0.50 perf-stat.i.MPKI
1.443e+11 -17.9% 1.184e+11 perf-stat.i.branch-instructions
0.08 ± 3% +0.0 0.12 perf-stat.i.branch-miss-rate%
1.049e+08 ± 2% +23.6% 1.296e+08 perf-stat.i.branch-misses
31.56 ± 11% +16.6 48.19 perf-stat.i.cache-miss-rate%
25936672 ± 22% +1080.1% 3.061e+08 perf-stat.i.cache-misses
77849475 ± 13% +714.6% 6.342e+08 perf-stat.i.cache-references
4288231 -33.1% 2868755 perf-stat.i.context-switches
0.85 +9.6% 0.94 perf-stat.i.cpi
6.387e+11 -10.2% 5.735e+11 perf-stat.i.cpu-cycles
2828 ± 24% +596.1% 19688 perf-stat.i.cpu-migrations
32456 ± 26% -94.2% 1870 perf-stat.i.cycles-between-cache-misses
7.486e+11 -18.2% 6.125e+11 perf-stat.i.instructions
1.17 -8.8% 1.07 perf-stat.i.ipc
19.17 -33.2% 12.81 perf-stat.i.metric.K/sec
0.03 ± 22% +1341.1% 0.50 perf-stat.overall.MPKI
0.07 ± 3% +0.0 0.11 perf-stat.overall.branch-miss-rate%
33.00 ± 10% +15.2 48.24 perf-stat.overall.cache-miss-rate%
0.85 +9.7% 0.94 perf-stat.overall.cpi
25848 ± 21% -92.7% 1874 perf-stat.overall.cycles-between-cache-misses
1.17 -8.9% 1.07 perf-stat.overall.ipc
1.419e+11 -18.1% 1.162e+11 perf-stat.ps.branch-instructions
1.028e+08 ± 2% +23.3% 1.268e+08 perf-stat.ps.branch-misses
25499974 ± 22% +1077.5% 3.003e+08 perf-stat.ps.cache-misses
76519245 ± 13% +713.3% 6.224e+08 perf-stat.ps.cache-references
4214394 -33.2% 2815077 perf-stat.ps.context-switches
6.278e+11 -10.4% 5.627e+11 perf-stat.ps.cpu-cycles
2763 ± 24% +598.5% 19305 perf-stat.ps.cpu-migrations
7.358e+11 -18.3% 6.009e+11 perf-stat.ps.instructions
4.489e+13 -18.3% 3.668e+13 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists