lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <202510281543.28d76c2-lkp@intel.com>
Date: Tue, 28 Oct 2025 15:29:26 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Shubhang Kaushik via B4 Relay
	<devnull+shubhang.os.amperecomputing.com@...nel.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, Ingo Molnar
	<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
	<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
	<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Shubhang Kaushik
	<sh@...two.org>, Shijie Huang <Shijie.Huang@...erecomputing.com>, Frank Wang
	<zwang@...erecomputing.com>, Christopher Lameter <cl@...two.org>, Adam Li
	<adam.li@...erecomputing.com>, Shubhang Kaushik
	<shubhang@...amperecomputing.com>, <oliver.sang@...el.com>
Subject: Re: [PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup



Hello,

kernel test robot noticed a 76.8% improvement of stress-ng.tee.ops_per_sec on:


commit: 24efd1bf8a44f0f51f42f4af4ce22f21e873073d ("[PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup")
url: https://github.com/intel-lab-lkp/linux/commits/Shubhang-Kaushik-via-B4-Relay/sched-fair-Prefer-cache-hot-prev_cpu-for-wakeup/20251018-092110
patch link: https://lore.kernel.org/all/20251017-b4-sched-cfs-refactor-propagate-v1-1-1eb0dc5b19b3@os.amperecomputing.com/
patch subject: [PATCH] sched/fair: Prefer cache-hot prev_cpu for wakeup

testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: tee
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251028/202510281543.28d76c2-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-spr-2sp4/tee/stress-ng/60s

commit: 
  9b332cece9 ("Merge tag 'nfsd-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux")
  24efd1bf8a ("sched/fair: Prefer cache-hot prev_cpu for wakeup")

9b332cece987ee17 24efd1bf8a44f0f51f42f4af4ce 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     12097 ±  3%     +10.9%      13413 ±  2%  uptime.idle
 3.662e+08 ±  7%    +382.7%  1.768e+09        cpuidle..time
   5056131 ± 56%    +426.8%   26635997 ±  3%  cpuidle..usage
  13144587 ± 11%     +21.1%   15921410        meminfo.Memused
  13326158 ± 11%     +20.6%   16067699        meminfo.max_used_kB
  58707455           -16.5%   49043102 ±  9%  numa-numastat.node1.local_node
  58841583           -16.4%   49176968 ±  9%  numa-numastat.node1.numa_hit
  58770618           -16.3%   49175467 ±  9%  numa-vmstat.node1.numa_hit
  58636509           -16.4%   49041602 ±  9%  numa-vmstat.node1.numa_local
      2184 ±  9%   +2157.3%      49310 ±  3%  perf-c2c.DRAM.remote
      3115 ± 11%   +1689.3%      55737 ±  3%  perf-c2c.HITM.local
      1193 ± 13%   +2628.6%      32575 ±  3%  perf-c2c.HITM.remote
      4308 ± 10%   +1949.6%      88312        perf-c2c.HITM.total
      1.95 ±  6%     +10.4       12.34        mpstat.cpu.all.idle%
      0.50 ±  3%      +1.0        1.53        mpstat.cpu.all.irq%
      0.02 ±  6%      +0.1        0.09 ±  5%  mpstat.cpu.all.soft%
     74.24            -7.0       67.21        mpstat.cpu.all.sys%
     23.29            -4.5       18.83        mpstat.cpu.all.usr%
    232818 ± 35%     -18.3%     190138        proc-vmstat.nr_anon_pages
    124104            -1.1%     122691        proc-vmstat.nr_slab_unreclaimable
 1.167e+08           -15.0%   99106005        proc-vmstat.numa_hit
 1.164e+08           -15.1%   98853060        proc-vmstat.numa_local
 1.168e+08           -15.2%   99060661        proc-vmstat.pgalloc_normal
 1.147e+08           -15.7%   96704739        proc-vmstat.pgfree
 1.071e+08 ±  2%     +76.8%  1.894e+08 ±  2%  stress-ng.tee.ops
   1786177 ±  2%     +76.8%    3157701 ±  2%  stress-ng.tee.ops_per_sec
 1.044e+08           -49.4%   52882701        stress-ng.time.involuntary_context_switches
     21972           -12.1%      19317        stress-ng.time.percent_of_cpu_this_job_got
     10131            -9.6%       9155        stress-ng.time.system_time
      3070           -20.2%       2450        stress-ng.time.user_time
 1.512e+08           -37.9%   93853736        stress-ng.time.voluntary_context_switches
      2816           -10.5%       2519        turbostat.Avg_MHz
     97.12            -9.8       87.30        turbostat.Busy%
      0.11 ± 52%      +0.5        0.66 ±  5%  turbostat.C1%
      0.40 ± 11%      +8.4        8.78        turbostat.C1E%
      2.39 ±  3%      +1.0        3.42 ±  2%  turbostat.C6%
      1.08 ±  9%    +168.3%       2.90 ±  3%  turbostat.CPU%c1
  32638444          +167.8%   87395049        turbostat.IRQ
    110.56           +14.6      125.14 ±  4%  turbostat.PKG_%
     23.05           +32.8%      30.62        turbostat.RAMWatt
   7559994           -21.3%    5948968        sched_debug.cfs_rq:/.avg_vruntime.avg
  11028968 ± 13%     -38.2%    6818572 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.max
      0.34 ± 13%    +104.0%       0.69 ±  3%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.38 ±  8%     +75.2%       0.67 ±  3%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
     20.67 ± 33%   +3672.8%     779.66 ± 73%  sched_debug.cfs_rq:/.load_avg.avg
    519.67         +7141.5%      37631 ± 10%  sched_debug.cfs_rq:/.load_avg.max
     86.71 ± 22%   +5134.0%       4538 ± 39%  sched_debug.cfs_rq:/.load_avg.stddev
   7559994           -21.3%    5948968        sched_debug.cfs_rq:/.min_vruntime.avg
  11028968 ± 13%     -38.2%    6818572 ±  4%  sched_debug.cfs_rq:/.min_vruntime.max
      0.12 ± 17%    +117.1%       0.27 ±  3%  sched_debug.cfs_rq:/.nr_queued.stddev
    809.69 ±  2%     +15.6%     936.26        sched_debug.cfs_rq:/.runnable_avg.avg
      2093 ±  3%     +18.5%       2480 ±  8%  sched_debug.cfs_rq:/.runnable_avg.max
    259.47 ± 18%     +71.8%     445.79 ±  3%  sched_debug.cfs_rq:/.runnable_avg.stddev
    576.64           -10.6%     515.40        sched_debug.cfs_rq:/.util_avg.avg
    137.33 ± 12%     +85.3%     254.45 ±  2%  sched_debug.cfs_rq:/.util_avg.stddev
    609.44           +15.6%     704.34 ±  3%  sched_debug.cfs_rq:/.util_est.avg
      1839 ± 11%     +23.7%       2274 ±  7%  sched_debug.cfs_rq:/.util_est.max
    245.27 ±  7%     +82.4%     447.29 ±  4%  sched_debug.cfs_rq:/.util_est.stddev
    702831 ±  5%     +19.6%     840863 ±  3%  sched_debug.cpu.avg_idle.avg
    378668 ± 14%     +32.2%     500458 ±  6%  sched_debug.cpu.avg_idle.stddev
     44.33 ± 22%     -62.7%      16.52 ± 13%  sched_debug.cpu.clock.stddev
    909.29 ± 12%    +160.6%       2369 ±  4%  sched_debug.cpu.curr->pid.stddev
    639355 ±  5%     +80.6%    1154626        sched_debug.cpu.max_idle_balance_cost.avg
    500000           +57.3%     786555 ± 11%  sched_debug.cpu.max_idle_balance_cost.min
      0.00 ± 20%     -47.2%       0.00 ± 18%  sched_debug.cpu.next_balance.stddev
      0.32 ± 14%    +111.2%       0.68 ±  3%  sched_debug.cpu.nr_running.stddev
    574871           -33.2%     383811        sched_debug.cpu.nr_switches.avg
    788985 ± 11%     -32.4%     533309 ±  6%  sched_debug.cpu.nr_switches.max
      0.04 ± 19%   +1073.1%       0.50        perf-stat.i.MPKI
 1.443e+11           -17.9%  1.184e+11        perf-stat.i.branch-instructions
      0.08 ±  3%      +0.0        0.12        perf-stat.i.branch-miss-rate%
 1.049e+08 ±  2%     +23.6%  1.296e+08        perf-stat.i.branch-misses
     31.56 ± 11%     +16.6       48.19        perf-stat.i.cache-miss-rate%
  25936672 ± 22%   +1080.1%  3.061e+08        perf-stat.i.cache-misses
  77849475 ± 13%    +714.6%  6.342e+08        perf-stat.i.cache-references
   4288231           -33.1%    2868755        perf-stat.i.context-switches
      0.85            +9.6%       0.94        perf-stat.i.cpi
 6.387e+11           -10.2%  5.735e+11        perf-stat.i.cpu-cycles
      2828 ± 24%    +596.1%      19688        perf-stat.i.cpu-migrations
     32456 ± 26%     -94.2%       1870        perf-stat.i.cycles-between-cache-misses
 7.486e+11           -18.2%  6.125e+11        perf-stat.i.instructions
      1.17            -8.8%       1.07        perf-stat.i.ipc
     19.17           -33.2%      12.81        perf-stat.i.metric.K/sec
      0.03 ± 22%   +1341.1%       0.50        perf-stat.overall.MPKI
      0.07 ±  3%      +0.0        0.11        perf-stat.overall.branch-miss-rate%
     33.00 ± 10%     +15.2       48.24        perf-stat.overall.cache-miss-rate%
      0.85            +9.7%       0.94        perf-stat.overall.cpi
     25848 ± 21%     -92.7%       1874        perf-stat.overall.cycles-between-cache-misses
      1.17            -8.9%       1.07        perf-stat.overall.ipc
 1.419e+11           -18.1%  1.162e+11        perf-stat.ps.branch-instructions
 1.028e+08 ±  2%     +23.3%  1.268e+08        perf-stat.ps.branch-misses
  25499974 ± 22%   +1077.5%  3.003e+08        perf-stat.ps.cache-misses
  76519245 ± 13%    +713.3%  6.224e+08        perf-stat.ps.cache-references
   4214394           -33.2%    2815077        perf-stat.ps.context-switches
 6.278e+11           -10.4%  5.627e+11        perf-stat.ps.cpu-cycles
      2763 ± 24%    +598.5%      19305        perf-stat.ps.cpu-migrations
 7.358e+11           -18.3%  6.009e+11        perf-stat.ps.instructions
 4.489e+13           -18.3%  3.668e+13        perf-stat.total.instructions




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ