[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202510291148.b2988254-lkp@intel.com>
Date: Wed, 29 Oct 2025 12:17:14 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Shubhang Kaushik via B4 Relay
<devnull+shubhang.os.amperecomputing.com@...nel.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, Ingo Molnar
<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Shubhang Kaushik
<sh@...two.org>, Shijie Huang <Shijie.Huang@...erecomputing.com>, Frank Wang
<zwang@...erecomputing.com>, Christopher Lameter <cl@...two.org>, Adam Li
<adam.li@...erecomputing.com>, Shubhang Kaushik
<shubhang@...amperecomputing.com>, <oliver.sang@...el.com>
Subject: Re: [PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup
Hello,
we just reported a "76.8% improvement of stress-ng.tee.ops_per_sec" in
https://lore.kernel.org/all/202510281543.28d76c2-lkp@intel.com/
now we captured a regression. FYI.
kernel test robot noticed a 8.5% regression of stress-ng.io-uring.ops_per_sec on:
commit: 24efd1bf8a44f0f51f42f4af4ce22f21e873073d ("[PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup")
url: https://github.com/intel-lab-lkp/linux/commits/Shubhang-Kaushik-via-B4-Relay/sched-fair-Prefer-cache-hot-prev_cpu-for-wakeup/20251018-092110
patch link: https://lore.kernel.org/all/20251017-b4-sched-cfs-refactor-propagate-v1-1-1eb0dc5b19b3@os.amperecomputing.com/
patch subject: [PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6768P CPU @ 2.4GHz (Granite Rapids) with 64G memory
parameters:
nr_threads: 100%
testtime: 60s
test: io-uring
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202510291148.b2988254-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251029/202510291148.b2988254-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp4/io-uring/stress-ng/60s
commit:
9b332cece9 ("Merge tag 'nfsd-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux")
24efd1bf8a ("sched/fair: Prefer cache-hot prev_cpu for wakeup")
9b332cece987ee17 24efd1bf8a44f0f51f42f4af4ce
---------------- ---------------------------
%stddev %change %stddev
\ | \
3.58e+09 +17.6% 4.21e+09 cpuidle..time
9.276e+08 -35.8% 5.958e+08 ± 2% cpuidle..usage
48009670 -12.7% 41899608 ± 4% numa-numastat.node0.local_node
48122238 -12.8% 41981276 ± 4% numa-numastat.node0.numa_hit
0.89 ± 44% +13.2 14.07 ± 3% turbostat.C1E%
0.67 ± 44% +381.0% 3.22 turbostat.CPU%c1
1.375e+08 ± 44% +199.4% 4.116e+08 turbostat.IRQ
4.70 ± 44% +224.5% 15.25 turbostat.RAMWatt
210.17 ± 77% +1158.0% 2643 perf-c2c.DRAM.local
1725 ± 11% +10694.5% 186276 ± 3% perf-c2c.DRAM.remote
320853 -50.4% 159203 ± 4% perf-c2c.HITM.local
1320 ± 13% +9462.5% 126256 ± 3% perf-c2c.HITM.remote
322174 -11.4% 285460 perf-c2c.HITM.total
14.00 ± 4% -2.1 11.92 ± 4% mpstat.cpu.all.idle%
13.31 +5.2 18.56 mpstat.cpu.all.iowait%
1.48 +4.9 6.39 ± 3% mpstat.cpu.all.irq%
0.85 -0.2 0.68 mpstat.cpu.all.soft%
3.51 -2.1 1.40 ± 5% mpstat.cpu.all.usr%
18.17 ± 4% +12.8% 20.50 ± 4% mpstat.max_utilization.seconds
12518136 -40.6% 7432802 ± 5% meminfo.Active
12518120 -40.6% 7432786 ± 5% meminfo.Active(anon)
14791509 -34.2% 9726112 ± 4% meminfo.Cached
17016588 -29.8% 11943542 ± 3% meminfo.Committed_AS
19860760 -19.5% 15994452 ± 2% meminfo.Memused
11109813 -45.8% 6019207 ± 6% meminfo.Shmem
19916177 -19.5% 16031079 ± 2% meminfo.max_used_kB
104776 ± 14% -24.3% 79337 ± 21% numa-meminfo.node0.KReclaimable
104776 ± 14% -24.3% 79337 ± 21% numa-meminfo.node0.SReclaimable
11913809 -42.7% 6821430 ± 5% numa-meminfo.node1.Active
11913804 -42.7% 6821421 ± 5% numa-meminfo.node1.Active(anon)
11336225 ± 2% -30.4% 7891392 ± 23% numa-meminfo.node1.FilePages
19000428 +14.7% 21787417 ± 8% numa-meminfo.node1.MemFree
11104229 -45.9% 6012466 ± 6% numa-meminfo.node1.Shmem
1.125e+09 -8.4% 1.03e+09 ± 3% stress-ng.io-uring.ops
18779554 -8.5% 17185210 ± 3% stress-ng.io-uring.ops_per_sec
2.353e+08 +58.7% 3.735e+08 ± 3% stress-ng.time.involuntary_context_switches
16880 -11.1% 15008 stress-ng.time.percent_of_cpu_this_job_got
9702 -8.5% 8878 stress-ng.time.system_time
443.21 -67.8% 142.54 ± 2% stress-ng.time.user_time
1.362e+09 -11.5% 1.206e+09 ± 3% stress-ng.time.voluntary_context_switches
26194 ± 14% -24.3% 19834 ± 21% numa-vmstat.node0.nr_slab_reclaimable
48122182 -12.8% 41981349 ± 4% numa-vmstat.node0.numa_hit
48009614 -12.7% 41899680 ± 4% numa-vmstat.node0.numa_local
2981009 -42.7% 1707865 ± 5% numa-vmstat.node1.nr_active_anon
2836469 ± 2% -30.4% 1975086 ± 23% numa-vmstat.node1.nr_file_pages
4747481 +14.7% 5444494 ± 8% numa-vmstat.node1.nr_free_pages
4714110 +14.8% 5411734 ± 8% numa-vmstat.node1.nr_free_pages_blocks
2778450 -45.8% 1505400 ± 6% numa-vmstat.node1.nr_shmem
2981003 -42.7% 1707858 ± 5% numa-vmstat.node1.nr_zone_active_anon
3131938 -40.6% 1860663 ± 5% proc-vmstat.nr_active_anon
1133648 +8.5% 1230219 proc-vmstat.nr_dirty_background_threshold
2270069 +8.5% 2463447 proc-vmstat.nr_dirty_threshold
3700155 -34.2% 2433658 ± 4% proc-vmstat.nr_file_pages
11441308 +8.5% 12408439 proc-vmstat.nr_free_pages
11335855 +8.6% 12314183 proc-vmstat.nr_free_pages_blocks
2779743 -45.8% 1507064 ± 6% proc-vmstat.nr_shmem
50620 -5.9% 47611 proc-vmstat.nr_slab_reclaimable
3131938 -40.6% 1860663 ± 5% proc-vmstat.nr_zone_active_anon
99148879 -9.8% 89432077 ± 3% proc-vmstat.numa_hit
98893495 -9.8% 89168637 ± 3% proc-vmstat.numa_local
54203 ± 24% -57.3% 23128 ± 10% proc-vmstat.numa_pages_migrated
99397243 -9.8% 89638031 ± 3% proc-vmstat.pgalloc_normal
94583491 -8.4% 86624034 ± 3% proc-vmstat.pgfree
54203 ± 24% -57.3% 23128 ± 10% proc-vmstat.pgmigrate_success
39031 +1.8% 39717 proc-vmstat.pgreuse
47196381 -31.6% 32305642 ± 2% proc-vmstat.unevictable_pgs_culled
0.08 ± 2% +3468.8% 2.97 ± 4% perf-stat.i.MPKI
7.499e+10 -21.9% 5.854e+10 ± 2% perf-stat.i.branch-instructions
0.94 -0.3 0.62 perf-stat.i.branch-miss-rate%
6.557e+08 -48.5% 3.38e+08 ± 2% perf-stat.i.branch-misses
0.70 ± 2% +36.7 37.40 ± 4% perf-stat.i.cache-miss-rate%
29724413 ± 2% +2544.3% 7.86e+08 perf-stat.i.cache-misses
5.32e+09 -60.5% 2.103e+09 ± 3% perf-stat.i.cache-references
42032996 -14.0% 36140436 ± 2% perf-stat.i.context-switches
2.29 +18.0% 2.70 ± 2% perf-stat.i.cpi
7.916e+11 -9.6% 7.154e+11 perf-stat.i.cpu-cycles
11062415 -99.9% 15481 perf-stat.i.cpu-migrations
44096 ± 5% -97.9% 910.19 perf-stat.i.cycles-between-cache-misses
3.698e+11 -22.9% 2.852e+11 ± 2% perf-stat.i.instructions
0.46 -14.9% 0.40 ± 2% perf-stat.i.ipc
0.05 ± 47% +96.1% 0.09 ± 14% perf-stat.i.major-faults
207.41 -31.9% 141.15 ± 2% perf-stat.i.metric.K/sec
0.08 ± 2% +3331.9% 2.76 ± 4% perf-stat.overall.MPKI
0.87 -0.3 0.58 perf-stat.overall.branch-miss-rate%
0.56 ± 2% +36.9 37.43 ± 4% perf-stat.overall.cache-miss-rate%
2.14 +17.3% 2.51 ± 2% perf-stat.overall.cpi
26647 ± 2% -96.6% 910.56 perf-stat.overall.cycles-between-cache-misses
0.47 -14.7% 0.40 ± 2% perf-stat.overall.ipc
7.375e+10 -21.9% 5.757e+10 ± 2% perf-stat.ps.branch-instructions
6.449e+08 -48.4% 3.325e+08 ± 2% perf-stat.ps.branch-misses
29243806 ± 2% +2543.5% 7.731e+08 perf-stat.ps.cache-misses
5.233e+09 -60.5% 2.068e+09 ± 3% perf-stat.ps.cache-references
41341425 -14.0% 35549572 ± 2% perf-stat.ps.context-switches
7.786e+11 -9.6% 7.037e+11 perf-stat.ps.cpu-cycles
10881167 -99.9% 15227 perf-stat.ps.cpu-migrations
3.637e+11 -22.9% 2.805e+11 ± 2% perf-stat.ps.instructions
0.05 ± 47% +93.6% 0.09 ± 14% perf-stat.ps.major-faults
2.217e+13 -22.7% 1.713e+13 ± 2% perf-stat.total.instructions
4219859 -17.8% 3469357 sched_debug.cfs_rq:/.avg_vruntime.avg
7247589 ± 9% -38.3% 4469027 ± 7% sched_debug.cfs_rq:/.avg_vruntime.max
4013259 -29.0% 2849620 ± 17% sched_debug.cfs_rq:/.avg_vruntime.min
265810 ± 14% -54.9% 119970 ± 11% sched_debug.cfs_rq:/.avg_vruntime.stddev
3.42 ± 10% -24.4% 2.58 ± 7% sched_debug.cfs_rq:/.h_nr_queued.max
3.33 ± 11% -22.5% 2.58 ± 7% sched_debug.cfs_rq:/.h_nr_runnable.max
4401036 -17.1% 3647494 ± 4% sched_debug.cfs_rq:/.left_deadline.max
1274751 ± 5% -18.7% 1035958 ± 12% sched_debug.cfs_rq:/.left_deadline.stddev
4400687 -17.1% 3647059 ± 4% sched_debug.cfs_rq:/.left_vruntime.max
1274640 ± 5% -18.7% 1035848 ± 12% sched_debug.cfs_rq:/.left_vruntime.stddev
4219859 -17.8% 3469357 sched_debug.cfs_rq:/.min_vruntime.avg
7247589 ± 9% -38.3% 4469027 ± 7% sched_debug.cfs_rq:/.min_vruntime.max
4013259 -29.0% 2849620 ± 17% sched_debug.cfs_rq:/.min_vruntime.min
265810 ± 14% -54.9% 119970 ± 11% sched_debug.cfs_rq:/.min_vruntime.stddev
4400687 -17.1% 3647059 ± 4% sched_debug.cfs_rq:/.right_vruntime.max
1274640 ± 5% -18.7% 1035848 ± 12% sched_debug.cfs_rq:/.right_vruntime.stddev
532.33 -11.4% 471.62 ± 2% sched_debug.cfs_rq:/.runnable_avg.avg
1361 ± 3% +18.4% 1611 ± 10% sched_debug.cfs_rq:/.runnable_avg.max
203.24 ± 4% +38.0% 280.47 ± 3% sched_debug.cfs_rq:/.runnable_avg.stddev
108.79 ± 5% +68.6% 183.41 ± 4% sched_debug.cfs_rq:/.util_avg.stddev
99.93 ± 8% +144.8% 244.58 ± 4% sched_debug.cfs_rq:/.util_est.avg
154.15 ± 10% +41.9% 218.69 ± 5% sched_debug.cfs_rq:/.util_est.stddev
585777 ± 3% +55.0% 907718 ± 6% sched_debug.cpu.avg_idle.avg
257569 ± 15% +30.0% 334947 ± 11% sched_debug.cpu.avg_idle.stddev
581651 ± 2% +97.0% 1146052 ± 3% sched_debug.cpu.max_idle_balance_cost.avg
1334820 ± 4% +10.9% 1479741 sched_debug.cpu.max_idle_balance_cost.max
150290 ± 9% +34.9% 202732 ± 7% sched_debug.cpu.max_idle_balance_cost.stddev
3.42 ± 10% -24.4% 2.58 ± 13% sched_debug.cpu.nr_running.max
4900954 -14.0% 4212806 ± 2% sched_debug.cpu.nr_switches.avg
1872618 ± 12% +57.1% 2941530 ± 17% sched_debug.cpu.nr_switches.min
-24.25 -67.7% -7.83 sched_debug.cpu.nr_uninterruptible.min
8.41 ± 12% -46.3% 4.52 ± 14% sched_debug.cpu.nr_uninterruptible.stddev
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists