[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202510291148.b2988254-lkp@intel.com>
Date: Wed, 29 Oct 2025 12:17:14 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Shubhang Kaushik via B4 Relay
	<devnull+shubhang.os.amperecomputing.com@...nel.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, Ingo Molnar
	<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
	<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
	<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Shubhang Kaushik
	<sh@...two.org>, Shijie Huang <Shijie.Huang@...erecomputing.com>, Frank Wang
	<zwang@...erecomputing.com>, Christopher Lameter <cl@...two.org>, Adam Li
	<adam.li@...erecomputing.com>, Shubhang Kaushik
	<shubhang@...amperecomputing.com>, <oliver.sang@...el.com>
Subject: Re: [PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup
Hello,
we just reported a "76.8% improvement of stress-ng.tee.ops_per_sec" in
https://lore.kernel.org/all/202510281543.28d76c2-lkp@intel.com/
now we captured a regression. FYI.
kernel test robot noticed a 8.5% regression of stress-ng.io-uring.ops_per_sec on:
commit: 24efd1bf8a44f0f51f42f4af4ce22f21e873073d ("[PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup")
url: https://github.com/intel-lab-lkp/linux/commits/Shubhang-Kaushik-via-B4-Relay/sched-fair-Prefer-cache-hot-prev_cpu-for-wakeup/20251018-092110
patch link: https://lore.kernel.org/all/20251017-b4-sched-cfs-refactor-propagate-v1-1-1eb0dc5b19b3@os.amperecomputing.com/
patch subject: [PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6768P  CPU @ 2.4GHz (Granite Rapids) with 64G memory
parameters:
	nr_threads: 100%
	testtime: 60s
	test: io-uring
	cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202510291148.b2988254-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251029/202510291148.b2988254-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp4/io-uring/stress-ng/60s
commit: 
  9b332cece9 ("Merge tag 'nfsd-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux")
  24efd1bf8a ("sched/fair: Prefer cache-hot prev_cpu for wakeup")
9b332cece987ee17 24efd1bf8a44f0f51f42f4af4ce 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  3.58e+09           +17.6%   4.21e+09        cpuidle..time
 9.276e+08           -35.8%  5.958e+08 ±  2%  cpuidle..usage
  48009670           -12.7%   41899608 ±  4%  numa-numastat.node0.local_node
  48122238           -12.8%   41981276 ±  4%  numa-numastat.node0.numa_hit
      0.89 ± 44%     +13.2       14.07 ±  3%  turbostat.C1E%
      0.67 ± 44%    +381.0%       3.22        turbostat.CPU%c1
 1.375e+08 ± 44%    +199.4%  4.116e+08        turbostat.IRQ
      4.70 ± 44%    +224.5%      15.25        turbostat.RAMWatt
    210.17 ± 77%   +1158.0%       2643        perf-c2c.DRAM.local
      1725 ± 11%  +10694.5%     186276 ±  3%  perf-c2c.DRAM.remote
    320853           -50.4%     159203 ±  4%  perf-c2c.HITM.local
      1320 ± 13%   +9462.5%     126256 ±  3%  perf-c2c.HITM.remote
    322174           -11.4%     285460        perf-c2c.HITM.total
     14.00 ±  4%      -2.1       11.92 ±  4%  mpstat.cpu.all.idle%
     13.31            +5.2       18.56        mpstat.cpu.all.iowait%
      1.48            +4.9        6.39 ±  3%  mpstat.cpu.all.irq%
      0.85            -0.2        0.68        mpstat.cpu.all.soft%
      3.51            -2.1        1.40 ±  5%  mpstat.cpu.all.usr%
     18.17 ±  4%     +12.8%      20.50 ±  4%  mpstat.max_utilization.seconds
  12518136           -40.6%    7432802 ±  5%  meminfo.Active
  12518120           -40.6%    7432786 ±  5%  meminfo.Active(anon)
  14791509           -34.2%    9726112 ±  4%  meminfo.Cached
  17016588           -29.8%   11943542 ±  3%  meminfo.Committed_AS
  19860760           -19.5%   15994452 ±  2%  meminfo.Memused
  11109813           -45.8%    6019207 ±  6%  meminfo.Shmem
  19916177           -19.5%   16031079 ±  2%  meminfo.max_used_kB
    104776 ± 14%     -24.3%      79337 ± 21%  numa-meminfo.node0.KReclaimable
    104776 ± 14%     -24.3%      79337 ± 21%  numa-meminfo.node0.SReclaimable
  11913809           -42.7%    6821430 ±  5%  numa-meminfo.node1.Active
  11913804           -42.7%    6821421 ±  5%  numa-meminfo.node1.Active(anon)
  11336225 ±  2%     -30.4%    7891392 ± 23%  numa-meminfo.node1.FilePages
  19000428           +14.7%   21787417 ±  8%  numa-meminfo.node1.MemFree
  11104229           -45.9%    6012466 ±  6%  numa-meminfo.node1.Shmem
 1.125e+09            -8.4%   1.03e+09 ±  3%  stress-ng.io-uring.ops
  18779554            -8.5%   17185210 ±  3%  stress-ng.io-uring.ops_per_sec
 2.353e+08           +58.7%  3.735e+08 ±  3%  stress-ng.time.involuntary_context_switches
     16880           -11.1%      15008        stress-ng.time.percent_of_cpu_this_job_got
      9702            -8.5%       8878        stress-ng.time.system_time
    443.21           -67.8%     142.54 ±  2%  stress-ng.time.user_time
 1.362e+09           -11.5%  1.206e+09 ±  3%  stress-ng.time.voluntary_context_switches
     26194 ± 14%     -24.3%      19834 ± 21%  numa-vmstat.node0.nr_slab_reclaimable
  48122182           -12.8%   41981349 ±  4%  numa-vmstat.node0.numa_hit
  48009614           -12.7%   41899680 ±  4%  numa-vmstat.node0.numa_local
   2981009           -42.7%    1707865 ±  5%  numa-vmstat.node1.nr_active_anon
   2836469 ±  2%     -30.4%    1975086 ± 23%  numa-vmstat.node1.nr_file_pages
   4747481           +14.7%    5444494 ±  8%  numa-vmstat.node1.nr_free_pages
   4714110           +14.8%    5411734 ±  8%  numa-vmstat.node1.nr_free_pages_blocks
   2778450           -45.8%    1505400 ±  6%  numa-vmstat.node1.nr_shmem
   2981003           -42.7%    1707858 ±  5%  numa-vmstat.node1.nr_zone_active_anon
   3131938           -40.6%    1860663 ±  5%  proc-vmstat.nr_active_anon
   1133648            +8.5%    1230219        proc-vmstat.nr_dirty_background_threshold
   2270069            +8.5%    2463447        proc-vmstat.nr_dirty_threshold
   3700155           -34.2%    2433658 ±  4%  proc-vmstat.nr_file_pages
  11441308            +8.5%   12408439        proc-vmstat.nr_free_pages
  11335855            +8.6%   12314183        proc-vmstat.nr_free_pages_blocks
   2779743           -45.8%    1507064 ±  6%  proc-vmstat.nr_shmem
     50620            -5.9%      47611        proc-vmstat.nr_slab_reclaimable
   3131938           -40.6%    1860663 ±  5%  proc-vmstat.nr_zone_active_anon
  99148879            -9.8%   89432077 ±  3%  proc-vmstat.numa_hit
  98893495            -9.8%   89168637 ±  3%  proc-vmstat.numa_local
     54203 ± 24%     -57.3%      23128 ± 10%  proc-vmstat.numa_pages_migrated
  99397243            -9.8%   89638031 ±  3%  proc-vmstat.pgalloc_normal
  94583491            -8.4%   86624034 ±  3%  proc-vmstat.pgfree
     54203 ± 24%     -57.3%      23128 ± 10%  proc-vmstat.pgmigrate_success
     39031            +1.8%      39717        proc-vmstat.pgreuse
  47196381           -31.6%   32305642 ±  2%  proc-vmstat.unevictable_pgs_culled
      0.08 ±  2%   +3468.8%       2.97 ±  4%  perf-stat.i.MPKI
 7.499e+10           -21.9%  5.854e+10 ±  2%  perf-stat.i.branch-instructions
      0.94            -0.3        0.62        perf-stat.i.branch-miss-rate%
 6.557e+08           -48.5%   3.38e+08 ±  2%  perf-stat.i.branch-misses
      0.70 ±  2%     +36.7       37.40 ±  4%  perf-stat.i.cache-miss-rate%
  29724413 ±  2%   +2544.3%   7.86e+08        perf-stat.i.cache-misses
  5.32e+09           -60.5%  2.103e+09 ±  3%  perf-stat.i.cache-references
  42032996           -14.0%   36140436 ±  2%  perf-stat.i.context-switches
      2.29           +18.0%       2.70 ±  2%  perf-stat.i.cpi
 7.916e+11            -9.6%  7.154e+11        perf-stat.i.cpu-cycles
  11062415           -99.9%      15481        perf-stat.i.cpu-migrations
     44096 ±  5%     -97.9%     910.19        perf-stat.i.cycles-between-cache-misses
 3.698e+11           -22.9%  2.852e+11 ±  2%  perf-stat.i.instructions
      0.46           -14.9%       0.40 ±  2%  perf-stat.i.ipc
      0.05 ± 47%     +96.1%       0.09 ± 14%  perf-stat.i.major-faults
    207.41           -31.9%     141.15 ±  2%  perf-stat.i.metric.K/sec
      0.08 ±  2%   +3331.9%       2.76 ±  4%  perf-stat.overall.MPKI
      0.87            -0.3        0.58        perf-stat.overall.branch-miss-rate%
      0.56 ±  2%     +36.9       37.43 ±  4%  perf-stat.overall.cache-miss-rate%
      2.14           +17.3%       2.51 ±  2%  perf-stat.overall.cpi
     26647 ±  2%     -96.6%     910.56        perf-stat.overall.cycles-between-cache-misses
      0.47           -14.7%       0.40 ±  2%  perf-stat.overall.ipc
 7.375e+10           -21.9%  5.757e+10 ±  2%  perf-stat.ps.branch-instructions
 6.449e+08           -48.4%  3.325e+08 ±  2%  perf-stat.ps.branch-misses
  29243806 ±  2%   +2543.5%  7.731e+08        perf-stat.ps.cache-misses
 5.233e+09           -60.5%  2.068e+09 ±  3%  perf-stat.ps.cache-references
  41341425           -14.0%   35549572 ±  2%  perf-stat.ps.context-switches
 7.786e+11            -9.6%  7.037e+11        perf-stat.ps.cpu-cycles
  10881167           -99.9%      15227        perf-stat.ps.cpu-migrations
 3.637e+11           -22.9%  2.805e+11 ±  2%  perf-stat.ps.instructions
      0.05 ± 47%     +93.6%       0.09 ± 14%  perf-stat.ps.major-faults
 2.217e+13           -22.7%  1.713e+13 ±  2%  perf-stat.total.instructions
   4219859           -17.8%    3469357        sched_debug.cfs_rq:/.avg_vruntime.avg
   7247589 ±  9%     -38.3%    4469027 ±  7%  sched_debug.cfs_rq:/.avg_vruntime.max
   4013259           -29.0%    2849620 ± 17%  sched_debug.cfs_rq:/.avg_vruntime.min
    265810 ± 14%     -54.9%     119970 ± 11%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      3.42 ± 10%     -24.4%       2.58 ±  7%  sched_debug.cfs_rq:/.h_nr_queued.max
      3.33 ± 11%     -22.5%       2.58 ±  7%  sched_debug.cfs_rq:/.h_nr_runnable.max
   4401036           -17.1%    3647494 ±  4%  sched_debug.cfs_rq:/.left_deadline.max
   1274751 ±  5%     -18.7%    1035958 ± 12%  sched_debug.cfs_rq:/.left_deadline.stddev
   4400687           -17.1%    3647059 ±  4%  sched_debug.cfs_rq:/.left_vruntime.max
   1274640 ±  5%     -18.7%    1035848 ± 12%  sched_debug.cfs_rq:/.left_vruntime.stddev
   4219859           -17.8%    3469357        sched_debug.cfs_rq:/.min_vruntime.avg
   7247589 ±  9%     -38.3%    4469027 ±  7%  sched_debug.cfs_rq:/.min_vruntime.max
   4013259           -29.0%    2849620 ± 17%  sched_debug.cfs_rq:/.min_vruntime.min
    265810 ± 14%     -54.9%     119970 ± 11%  sched_debug.cfs_rq:/.min_vruntime.stddev
   4400687           -17.1%    3647059 ±  4%  sched_debug.cfs_rq:/.right_vruntime.max
   1274640 ±  5%     -18.7%    1035848 ± 12%  sched_debug.cfs_rq:/.right_vruntime.stddev
    532.33           -11.4%     471.62 ±  2%  sched_debug.cfs_rq:/.runnable_avg.avg
      1361 ±  3%     +18.4%       1611 ± 10%  sched_debug.cfs_rq:/.runnable_avg.max
    203.24 ±  4%     +38.0%     280.47 ±  3%  sched_debug.cfs_rq:/.runnable_avg.stddev
    108.79 ±  5%     +68.6%     183.41 ±  4%  sched_debug.cfs_rq:/.util_avg.stddev
     99.93 ±  8%    +144.8%     244.58 ±  4%  sched_debug.cfs_rq:/.util_est.avg
    154.15 ± 10%     +41.9%     218.69 ±  5%  sched_debug.cfs_rq:/.util_est.stddev
    585777 ±  3%     +55.0%     907718 ±  6%  sched_debug.cpu.avg_idle.avg
    257569 ± 15%     +30.0%     334947 ± 11%  sched_debug.cpu.avg_idle.stddev
    581651 ±  2%     +97.0%    1146052 ±  3%  sched_debug.cpu.max_idle_balance_cost.avg
   1334820 ±  4%     +10.9%    1479741        sched_debug.cpu.max_idle_balance_cost.max
    150290 ±  9%     +34.9%     202732 ±  7%  sched_debug.cpu.max_idle_balance_cost.stddev
      3.42 ± 10%     -24.4%       2.58 ± 13%  sched_debug.cpu.nr_running.max
   4900954           -14.0%    4212806 ±  2%  sched_debug.cpu.nr_switches.avg
   1872618 ± 12%     +57.1%    2941530 ± 17%  sched_debug.cpu.nr_switches.min
    -24.25           -67.7%      -7.83        sched_debug.cpu.nr_uninterruptible.min
      8.41 ± 12%     -46.3%       4.52 ± 14%  sched_debug.cpu.nr_uninterruptible.stddev
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists
 
