[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <202509261113.a87577ce-lkp@intel.com>
Date: Fri, 26 Sep 2025 12:56:49 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Fernand Sieber <sieberf@...zon.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, <peterz@...radead.org>,
<bsegall@...gle.com>, <dietmar.eggemann@....com>, <dwmw@...zon.co.uk>,
<graf@...zon.com>, <jschoenh@...zon.de>, <juri.lelli@...hat.com>,
<mingo@...hat.com>, <sieberf@...zon.com>, <tanghui20@...wei.com>,
<vincent.guittot@...aro.org>, <vineethr@...ux.ibm.com>,
<wangtao554@...wei.com>, <zhangqiao22@...wei.com>, <oliver.sang@...el.com>
Subject: Re: [PATCH v3] sched/fair: Forfeit vruntime on yield
Hello,
we reported "a 55.9% improvement of stress-ng.wait.ops_per_sec"
in https://lore.kernel.org/all/202509241501.f14b210a-lkp@intel.com/
now we noticed there is also a regression in our tests. report again FYI.
one thing we want to mention is the "stress-ng.sockpair.MB_written_per_sec" is
in "miscellaneous metrics" of this stress-ng test. for major part,
"stress-ng.sockpair.ops_per_sec", it's just a small difference.
0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0
---------------- ---------------------------
%stddev %change %stddev
\ | \
551.38 -90.5% 52.18 stress-ng.sockpair.MB_written_per_sec
781743 -2.3% 764106 stress-ng.sockpair.ops_per_sec
below is a test example for 15bf8c7b35:
2025-09-25 15:48:21 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --oom-avoid --sockpair 192
stress-ng: info: [8371] setting to a 1 min run per stressor
stress-ng: info: [8371] dispatching hogs: 192 sockpair
stress-ng: info: [8371] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics
stress-ng: metrc: [8371] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [8371] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [8371] sockpair 49874197 65.44 72.08 12219.54 762108.28 4057.58 97.82 3132
stress-ng: metrc: [8371] miscellaneous metrics:
stress-ng: metrc: [8371] sockpair 27717.04 socketpair calls sec (harmonic mean of 192 instances)
stress-ng: metrc: [8371] sockpair 53.01 MB written per sec (harmonic mean of 192 instances)
stress-ng: info: [8371] for a 66.13s run time:
stress-ng: info: [8371] 12696.46s available CPU time
stress-ng: info: [8371] 72.07s user time ( 0.57%)
stress-ng: info: [8371] 12219.63s system time ( 96.24%)
stress-ng: info: [8371] 12291.70s total time ( 96.81%)
stress-ng: info: [8371] load average: 190.99 57.46 19.94
stress-ng: info: [8371] skipped: 0
stress-ng: info: [8371] passed: 192: sockpair (192)
stress-ng: info: [8371] failed: 0
stress-ng: info: [8371] metrics untrustworthy: 0
stress-ng: info: [8371] successful run completed in 1 min, 6.13 secs
below is an exmple from 0d4eaf8caf:
2025-09-25 18:04:37 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --oom-avoid --sockpair 192
stress-ng: info: [8360] setting to a 1 min run per stressor
stress-ng: info: [8360] dispatching hogs: 192 sockpair
stress-ng: info: [8360] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics
stress-ng: metrc: [8360] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [8360] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [8360] sockpair 51705787 65.08 56.75 12254.39 794448.25 4199.92 98.52 5160
stress-ng: metrc: [8360] miscellaneous metrics:
stress-ng: metrc: [8360] sockpair 28156.62 socketpair calls sec (harmonic mean of 192 instances)
stress-ng: metrc: [8360] sockpair 562.18 MB written per sec (harmonic mean of 192 instances)
stress-ng: info: [8360] for a 65.40s run time:
stress-ng: info: [8360] 12556.08s available CPU time
stress-ng: info: [8360] 56.75s user time ( 0.45%)
stress-ng: info: [8360] 12254.48s system time ( 97.60%)
stress-ng: info: [8360] 12311.23s total time ( 98.05%)
stress-ng: info: [8360] load average: 239.81 72.31 25.10
stress-ng: info: [8360] skipped: 0
stress-ng: info: [8360] passed: 192: sockpair (192)
stress-ng: info: [8360] failed: 0
stress-ng: info: [8360] metrics untrustworthy: 0
stress-ng: info: [8360] successful run completed in 1 min, 5.40 secs
below is full report.
kernel test robot noticed a 90.5% regression of stress-ng.sockpair.MB_written_per_sec on:
commit: 15bf8c7b35e31295b26241425c0a61102e92109f ("[PATCH v3] sched/fair: Forfeit vruntime on yield")
url: https://github.com/intel-lab-lkp/linux/commits/Fernand-Sieber/sched-fair-Forfeit-vruntime-on-yield/20250918-231320
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 0d4eaf8caf8cd633b23e949e2996b420052c2d45
patch link: https://lore.kernel.org/all/20250918150528.292620-1-sieberf@amazon.com/
patch subject: [PATCH v3] sched/fair: Forfeit vruntime on yield
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 192 threads 2 sockets Intel(R) Xeon(R) 6740E CPU @ 2.4GHz (Sierra Forest) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: sockpair
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202509261113.a87577ce-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250926/202509261113.a87577ce-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-srf-2sp3/sockpair/stress-ng/60s
commit:
0d4eaf8caf ("sched/fair: Do not balance task to a throttled cfs_rq")
15bf8c7b35 ("sched/fair: Forfeit vruntime on yield")
0d4eaf8caf8cd633 15bf8c7b35e31295b26241425c0
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.78 ± 2% +0.2 1.02 mpstat.cpu.all.usr%
19.57 -36.8% 12.36 ± 70% turbostat.RAMWatt
4.073e+08 ± 6% +23.1% 5.013e+08 ± 5% cpuidle..time
266261 ± 9% +46.4% 389733 ± 9% cpuidle..usage
451887 ± 77% +160.9% 1178929 ± 33% numa-vmstat.node0.nr_file_pages
192819 ± 30% +101.3% 388191 ± 43% numa-vmstat.node1.nr_shmem
1807416 ± 77% +161.0% 4716665 ± 33% numa-meminfo.node0.FilePages
8980121 -9.0% 8174177 numa-meminfo.node0.SUnreclaim
25356157 ± 8% -22.0% 19772595 ± 9% numa-meminfo.node1.MemUsed
771480 ± 30% +101.4% 1553932 ± 43% numa-meminfo.node1.Shmem
551.38 -90.5% 52.18 stress-ng.sockpair.MB_written_per_sec
51092272 -2.2% 49968621 stress-ng.sockpair.ops
781743 -2.3% 764106 stress-ng.sockpair.ops_per_sec
21418332 ± 4% +69.2% 36232510 stress-ng.time.involuntary_context_switches
56.36 +27.4% 71.81 stress-ng.time.user_time
150809 ± 21% +17217.1% 26115838 ± 3% stress-ng.time.voluntary_context_switches
2165914 ± 7% +92.3% 4165197 ± 4% meminfo.Active
2165898 ± 7% +92.3% 4165181 ± 4% meminfo.Active(anon)
4926568 +39.6% 6875228 meminfo.Cached
6826363 +28.1% 8744371 meminfo.Committed_AS
513281 ± 8% +98.7% 1019681 ± 6% meminfo.Mapped
48472806 ± 2% -14.8% 41314088 meminfo.Memused
1276164 +152.7% 3224818 ± 3% meminfo.Shmem
53022761 ± 2% -15.7% 44672632 meminfo.max_used_kB
0.53 -81.0% 0.10 ± 4% perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
0.53 -81.0% 0.10 ± 4% perf-sched.total_sch_delay.average.ms
2.03 -68.4% 0.64 ± 4% perf-sched.total_wait_and_delay.average.ms
1811449 +200.9% 5449776 ± 4% perf-sched.total_wait_and_delay.count.ms
1.50 -64.0% 0.54 ± 4% perf-sched.total_wait_time.average.ms
2.03 -68.4% 0.64 ± 4% perf-sched.wait_and_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
1811449 +200.9% 5449776 ± 4% perf-sched.wait_and_delay.count.[unknown].[unknown].[unknown].[unknown].[unknown]
1.50 -64.0% 0.54 ± 4% perf-sched.wait_time.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
541937 ± 7% +92.5% 1043389 ± 4% proc-vmstat.nr_active_anon
5242293 +3.5% 5423918 proc-vmstat.nr_dirty_background_threshold
10497404 +3.5% 10861099 proc-vmstat.nr_dirty_threshold
1232280 +39.7% 1721251 proc-vmstat.nr_file_pages
52782357 +3.4% 54601330 proc-vmstat.nr_free_pages
52117733 +3.8% 54073313 proc-vmstat.nr_free_pages_blocks
128259 ± 8% +100.8% 257594 ± 6% proc-vmstat.nr_mapped
319681 +153.0% 808650 ± 3% proc-vmstat.nr_shmem
4489133 -8.9% 4089704 proc-vmstat.nr_slab_unreclaimable
541937 ± 7% +92.5% 1043389 ± 4% proc-vmstat.nr_zone_active_anon
77303955 +2.5% 79201972 proc-vmstat.pgalloc_normal
519724 +5.2% 546556 proc-vmstat.pgfault
76456707 +1.7% 77739095 proc-vmstat.pgfree
12794131 ± 6% -27.4% 9288185 sched_debug.cfs_rq:/.avg_vruntime.max
4610143 ± 8% -14.9% 3923890 ± 5% sched_debug.cfs_rq:/.avg_vruntime.min
1.03 -20.1% 0.83 ± 2% sched_debug.cfs_rq:/.h_nr_queued.avg
1.03 -20.8% 0.82 ± 2% sched_debug.cfs_rq:/.h_nr_runnable.avg
895.00 ± 70% +89.0% 1691 ± 2% sched_debug.cfs_rq:/.load.min
0.67 ± 55% +125.0% 1.50 sched_debug.cfs_rq:/.load_avg.min
12794131 ± 6% -27.4% 9288185 sched_debug.cfs_rq:/.min_vruntime.max
4610143 ± 8% -14.9% 3923896 ± 5% sched_debug.cfs_rq:/.min_vruntime.min
1103 -20.2% 880.86 sched_debug.cfs_rq:/.runnable_avg.avg
428.26 ± 6% -63.4% 156.94 ± 22% sched_debug.cfs_rq:/.util_est.avg
1775 ± 6% -39.3% 1077 ± 15% sched_debug.cfs_rq:/.util_est.max
396.33 ± 6% -50.0% 198.03 ± 17% sched_debug.cfs_rq:/.util_est.stddev
50422 ± 6% -34.7% 32915 ± 18% sched_debug.cpu.avg_idle.min
456725 ± 10% +39.4% 636811 ± 4% sched_debug.cpu.avg_idle.stddev
611566 ± 5% +25.0% 764424 ± 2% sched_debug.cpu.max_idle_balance_cost.avg
190657 ± 12% +36.1% 259410 ± 5% sched_debug.cpu.max_idle_balance_cost.stddev
1.04 -20.4% 0.82 ± 2% sched_debug.cpu.nr_running.avg
57214 ± 4% +183.5% 162228 ± 2% sched_debug.cpu.nr_switches.avg
253314 ± 4% +39.3% 352777 ± 4% sched_debug.cpu.nr_switches.max
59410 ± 6% +31.6% 78186 ± 10% sched_debug.cpu.nr_switches.stddev
3.33 -27.9% 2.40 perf-stat.i.MPKI
1.207e+10 +11.3% 1.344e+10 perf-stat.i.branch-instructions
0.21 ± 7% +0.0 0.24 ± 5% perf-stat.i.branch-miss-rate%
23462655 ± 6% +27.4% 29896517 ± 3% perf-stat.i.branch-misses
75.74 -4.4 71.33 perf-stat.i.cache-miss-rate%
1.861e+08 -21.5% 1.462e+08 perf-stat.i.cache-misses
2.435e+08 -17.1% 2.017e+08 perf-stat.i.cache-references
323065 ± 5% +191.4% 941425 ± 2% perf-stat.i.context-switches
10.73 -9.7% 9.69 perf-stat.i.cpi
353.45 +39.0% 491.13 ± 4% perf-stat.i.cpu-migrations
3589 +30.5% 4685 perf-stat.i.cycles-between-cache-misses
5.645e+10 +12.0% 6.323e+10 perf-stat.i.instructions
0.09 +12.1% 0.11 perf-stat.i.ipc
1.66 ± 5% +193.9% 4.89 ± 2% perf-stat.i.metric.K/sec
6247 +5.7% 6603 ± 2% perf-stat.i.minor-faults
6248 +5.7% 6604 ± 2% perf-stat.i.page-faults
3.33 -29.7% 2.34 perf-stat.overall.MPKI
0.20 ± 7% +0.0 0.23 ± 4% perf-stat.overall.branch-miss-rate%
76.67 -3.9 72.79 perf-stat.overall.cache-miss-rate%
10.54 -11.1% 9.37 perf-stat.overall.cpi
3168 +26.5% 4007 perf-stat.overall.cycles-between-cache-misses
0.09 +12.5% 0.11 perf-stat.overall.ipc
1.204e+10 +11.1% 1.337e+10 perf-stat.ps.branch-instructions
23586580 ± 7% +29.7% 30600100 ± 4% perf-stat.ps.branch-misses
1.873e+08 -21.4% 1.471e+08 perf-stat.ps.cache-misses
2.443e+08 -17.3% 2.021e+08 perf-stat.ps.cache-references
324828 ± 5% +187.0% 932274 ± 2% perf-stat.ps.context-switches
335.13 ± 2% +41.7% 474.95 ± 5% perf-stat.ps.cpu-migrations
5.632e+10 +11.7% 6.293e+10 perf-stat.ps.instructions
6282 +6.5% 6690 ± 2% perf-stat.ps.minor-faults
6284 +6.5% 6692 ± 2% perf-stat.ps.page-faults
3.764e+12 +12.2% 4.224e+12 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists