lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202409231416.9403c2e9-oliver.sang@intel.com>
Date: Mon, 23 Sep 2024 15:01:58 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	Chunxin Zang <zangchunxin@...iang.com>, Valentin Schneider
	<vschneid@...hat.com>, Mike Galbraith <umgwanakikbuti@...il.com>,
	<ying.huang@...el.com>, <feng.tang@...el.com>, <fengwei.yin@...el.com>,
	<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, <oliver.sang@...el.com>
Subject: [linus:master] [sched/eevdf]  85e511df3c:  hackbench.throughput
 -13.1% regression



Hello,

FYI. Chenyu (Cced) will post a trial patch soon for below report.


kernel test robot noticed a -13.1% regression of hackbench.throughput on:


commit: 85e511df3cec46021024176672a748008ed135bf ("sched/eevdf: Allow shorter slices to wakeup-preempt")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: hackbench
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	nr_threads: 50%
	iterations: 4
	mode: process
	ipc: socket
	cpufreq_governor: performance



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202409231416.9403c2e9-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240923/202409231416.9403c2e9-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
  gcc-12/performance/socket/4/x86_64-rhel-8.3/process/50%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/hackbench

commit: 
  82e9d0456e ("sched/fair: Avoid re-setting virtual deadline on 'migrations'")
  85e511df3c ("sched/eevdf: Allow shorter slices to wakeup-preempt")

82e9d0456e06cebe 85e511df3cec46021024176672a 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    217.40           +13.5%     246.74        uptime.boot
   5391461 ± 19%     +16.5%    6281524 ±  6%  numa-meminfo.node0.MemUsed
    352581 ± 13%     +24.6%     439472 ± 16%  numa-meminfo.node0.SUnreclaim
   4679401           -15.8%    3938145        vmstat.system.cs
    854648           -15.2%     724774        vmstat.system.in
      0.46 ±  2%      -0.1        0.40        mpstat.cpu.all.irq%
      0.03 ±  3%      -0.0        0.03        mpstat.cpu.all.soft%
      3.35            -0.6        2.75        mpstat.cpu.all.usr%
     44542            +2.7%      45755        proc-vmstat.nr_slab_reclaimable
    642130 ± 68%     -71.2%     184909 ± 11%  proc-vmstat.pgactivate
   2170433 ±  2%      +6.8%    2318318 ±  2%  proc-vmstat.pgfault
    138302 ±  4%      +6.7%     147631 ±  3%  proc-vmstat.pgreuse
    623219           -13.1%     541887        hackbench.throughput
    606251           -14.1%     520789        hackbench.throughput_avg
    623219           -13.1%     541887        hackbench.throughput_best
    580034           -14.8%     494354        hackbench.throughput_worst
    174.58           +16.3%     203.09        hackbench.time.elapsed_time
    174.58           +16.3%     203.09        hackbench.time.elapsed_time.max
 1.654e+08            +2.2%   1.69e+08        hackbench.time.involuntary_context_switches
     36869           +17.6%      43340        hackbench.time.system_time
      1172            -5.5%       1107        hackbench.time.user_time
 6.478e+08            -3.5%  6.255e+08        hackbench.time.voluntary_context_switches
 6.354e+10           -11.4%   5.63e+10        perf-stat.i.branch-instructions
 3.226e+08           -12.5%  2.822e+08        perf-stat.i.branch-misses
  94557935 ±  3%     -15.2%   80197744 ±  2%  perf-stat.i.cache-misses
 2.563e+09           -13.7%  2.212e+09        perf-stat.i.cache-references
   4710895           -15.9%    3959720        perf-stat.i.context-switches
      1.86           +14.3%       2.13        perf-stat.i.cpi
    601598           -15.0%     511540        perf-stat.i.cpu-migrations
      7390 ±  5%     +28.4%       9492 ±  2%  perf-stat.i.cycles-between-cache-misses
 3.408e+11           -12.1%  2.997e+11        perf-stat.i.instructions
      0.54           -12.2%       0.47        perf-stat.i.ipc
     23.73           -15.9%      19.95        perf-stat.i.metric.K/sec
      1.66 ± 35%     +28.5%       2.13        perf-stat.overall.cpi
      6006 ± 35%     +33.5%       8020 ±  2%  perf-stat.overall.cycles-between-cache-misses
 5.287e+13 ± 35%     +15.1%  6.083e+13        perf-stat.total.instructions
  13829361           +51.8%   20989754        sched_debug.cfs_rq:/.avg_vruntime.avg
  18756074 ±  5%     +44.2%   27055241 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.max
  12499623 ±  2%     +52.4%   19043277 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.min
      8.93 ±  2%     +14.1%      10.19        sched_debug.cfs_rq:/.h_nr_running.avg
      4.68 ±  3%     +10.0%       5.15 ±  2%  sched_debug.cfs_rq:/.h_nr_running.stddev
      0.44 ± 35%     +75.8%       0.78 ± 19%  sched_debug.cfs_rq:/.load_avg.min
  13829361           +51.8%   20989754        sched_debug.cfs_rq:/.min_vruntime.avg
  18756074 ±  5%     +44.2%   27055241 ±  3%  sched_debug.cfs_rq:/.min_vruntime.max
  12499623 ±  2%     +52.4%   19043277 ±  2%  sched_debug.cfs_rq:/.min_vruntime.min
      0.68           +11.7%       0.76        sched_debug.cfs_rq:/.nr_running.avg
    176.30 ±  3%     -22.8%     136.16 ±  4%  sched_debug.cfs_rq:/.removed.runnable_avg.max
    176.30 ±  3%     -22.8%     136.16 ±  4%  sched_debug.cfs_rq:/.removed.util_avg.max
      8995           +16.0%      10437        sched_debug.cfs_rq:/.runnable_avg.avg
     18978 ±  6%     +13.7%      21579 ±  6%  sched_debug.cfs_rq:/.runnable_avg.max
      2890 ±  4%     +13.9%       3292 ±  3%  sched_debug.cfs_rq:/.runnable_avg.stddev
    415209 ± 22%     -23.3%     318311 ±  3%  sched_debug.cpu.avg_idle.avg
    102333 ±  2%     +30.5%     133496 ±  2%  sched_debug.cpu.clock.avg
    102519 ±  2%     +30.4%     133722 ±  2%  sched_debug.cpu.clock.max
    102127 ±  2%     +30.5%     133254 ±  2%  sched_debug.cpu.clock.min
    101839 ±  2%     +30.5%     132880 ±  2%  sched_debug.cpu.clock_task.avg
    102169 ±  2%     +30.4%     133268 ±  2%  sched_debug.cpu.clock_task.max
     87129 ±  2%     +35.6%     118117 ±  2%  sched_debug.cpu.clock_task.min
     11573           +32.4%      15327        sched_debug.cpu.curr->pid.avg
     14704           +23.9%      18214        sched_debug.cpu.curr->pid.max
      1516 ±  9%     +16.4%       1765 ± 10%  sched_debug.cpu.curr->pid.stddev
      8.92 ±  2%     +14.1%      10.18        sched_debug.cpu.nr_running.avg
      4.69 ±  2%     +10.0%       5.16 ±  2%  sched_debug.cpu.nr_running.stddev
   1232815 ±  2%     +27.3%    1569099        sched_debug.cpu.nr_switches.avg
   1411362 ±  5%     +26.8%    1789325 ±  3%  sched_debug.cpu.nr_switches.max
   1045767 ±  2%     +27.3%    1331341 ±  3%  sched_debug.cpu.nr_switches.min
    102127 ±  2%     +30.5%     133250 ±  2%  sched_debug.cpu_clk
    101071 ±  2%     +30.8%     132194 ±  2%  sched_debug.ktime
      0.00           -25.0%       0.00        sched_debug.rt_rq:.rt_nr_running.avg
      0.33           -25.0%       0.25        sched_debug.rt_rq:.rt_nr_running.max
      0.02           -25.0%       0.02        sched_debug.rt_rq:.rt_nr_running.stddev
    102997 ±  2%     +30.2%     134142 ±  2%  sched_debug.sched_clk
  16347631          +100.0%   32695263        sched_debug.sysctl_sched.sysctl_sched_features
      1.60 ±  2%      -0.1        1.45 ±  5%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.read
      1.50 ±  2%      -0.1        1.36 ±  6%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
      0.62 ±  2%      -0.1        0.50 ± 38%  perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
      0.78 ±  2%      -0.1        0.71 ±  7%  perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
     39.00            +0.5       39.50        perf-profile.calltrace.cycles-pp.sock_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     37.56            +0.6       38.15        perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write.do_syscall_64
      1.77 ±  2%      -0.2        1.61 ±  5%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      1.87 ±  2%      -0.2        1.72 ±  5%  perf-profile.children.cycles-pp.mod_objcg_state
      0.90 ±  2%      -0.1        0.83 ±  5%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.85 ±  2%      -0.1        0.78 ±  5%  perf-profile.children.cycles-pp.obj_cgroup_charge
      0.10 ±  4%      -0.1        0.04 ±104%  perf-profile.children.cycles-pp.__handle_mm_fault
      0.10 ±  4%      -0.1        0.04 ±104%  perf-profile.children.cycles-pp.handle_mm_fault
      0.10 ±  4%      -0.1        0.04 ±102%  perf-profile.children.cycles-pp.do_user_addr_fault
      0.10 ±  4%      -0.1        0.04 ±102%  perf-profile.children.cycles-pp.exc_page_fault
      0.63 ±  2%      -0.1        0.57 ±  5%  perf-profile.children.cycles-pp.__cond_resched
      0.10 ±  4%      -0.0        0.05 ± 63%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.32 ±  2%      -0.0        0.28 ±  3%  perf-profile.children.cycles-pp.task_mm_cid_work
      0.32 ±  2%      -0.0        0.28 ±  3%  perf-profile.children.cycles-pp.task_work_run
      0.34 ±  2%      -0.0        0.30 ±  5%  perf-profile.children.cycles-pp.rcu_all_qs
      0.32 ±  3%      -0.0        0.29 ±  6%  perf-profile.children.cycles-pp.__virt_addr_valid
      0.18 ±  4%      -0.0        0.16 ±  7%  perf-profile.children.cycles-pp.__enqueue_entity
      0.23 ±  3%      -0.0        0.21 ±  6%  perf-profile.children.cycles-pp.set_next_entity
      0.14 ±  4%      -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.__dequeue_entity
      0.06            -0.0        0.05        perf-profile.children.cycles-pp.cpuacct_charge
      0.09 ±  8%      +0.0        0.12 ±  7%  perf-profile.children.cycles-pp.generic_perform_write
      0.07 ± 10%      +0.0        0.10 ± 11%  perf-profile.children.cycles-pp.sched_balance_find_src_group
      0.06 ± 10%      +0.0        0.09 ± 11%  perf-profile.children.cycles-pp.update_sg_lb_stats
      0.07 ± 10%      +0.0        0.10 ± 11%  perf-profile.children.cycles-pp.update_sd_lb_stats
      0.13 ±  8%      +0.0        0.16 ±  8%  perf-profile.children.cycles-pp.writen
      0.02 ±111%      +0.0        0.06 ± 13%  perf-profile.children.cycles-pp.set_task_cpu
      0.02 ±141%      +0.0        0.06 ± 11%  perf-profile.children.cycles-pp.ring_buffer_read_head
      0.00            +0.1        0.06 ±  8%  perf-profile.children.cycles-pp.vruntime_eligible
      0.22 ±  8%      +0.1        0.28 ±  9%  perf-profile.children.cycles-pp.perf_mmap__push
      0.23 ±  8%      +0.1        0.29 ±  9%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.23 ±  7%      +0.1        0.29 ±  9%  perf-profile.children.cycles-pp.cmd_record
      0.23 ±  7%      +0.1        0.30 ±  9%  perf-profile.children.cycles-pp.handle_internal_command
      0.23 ±  7%      +0.1        0.30 ±  9%  perf-profile.children.cycles-pp.main
      0.23 ±  7%      +0.1        0.30 ±  9%  perf-profile.children.cycles-pp.run_builtin
      0.00            +0.1        0.07 ± 19%  perf-profile.children.cycles-pp.schedule_idle
      0.00            +0.1        0.12 ± 19%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      0.00            +0.1        0.14 ± 20%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.00            +0.2        0.15 ± 17%  perf-profile.children.cycles-pp.intel_idle
      0.00            +0.2        0.17 ± 18%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.01 ±282%      +0.2        0.18 ± 70%  perf-profile.children.cycles-pp.available_idle_cpu
      0.23 ±  3%      +0.2        0.43 ± 16%  perf-profile.children.cycles-pp.prepare_to_wait
      0.00            +0.2        0.20 ± 18%  perf-profile.children.cycles-pp.cpuidle_enter
      0.00            +0.2        0.20 ± 18%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.01 ±282%      +0.2        0.22 ± 16%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.00            +0.3        0.35 ±110%  perf-profile.children.cycles-pp.select_idle_cpu
      0.06 ±  6%      +0.4        0.41 ± 95%  perf-profile.children.cycles-pp.select_idle_sibling
      0.22 ±  2%      +0.4        0.57 ± 70%  perf-profile.children.cycles-pp.select_task_rq_fair
      0.26 ±  5%      +0.4        0.62 ± 65%  perf-profile.children.cycles-pp.select_task_rq
      0.04 ± 77%      +0.4        0.43 ± 18%  perf-profile.children.cycles-pp.start_secondary
      0.04 ± 77%      +0.4        0.43 ± 18%  perf-profile.children.cycles-pp.do_idle
      0.04 ± 77%      +0.4        0.43 ± 17%  perf-profile.children.cycles-pp.common_startup_64
      0.04 ± 77%      +0.4        0.43 ± 17%  perf-profile.children.cycles-pp.cpu_startup_entry
     40.28            +0.4       40.71        perf-profile.children.cycles-pp.vfs_write
     39.04            +0.5       39.54        perf-profile.children.cycles-pp.sock_write_iter
     37.73            +0.6       38.31        perf-profile.children.cycles-pp.unix_stream_sendmsg
      1.50 ±  2%      -0.1        1.37 ±  5%  perf-profile.self.cycles-pp.mod_objcg_state
      1.41 ±  2%      -0.1        1.30 ±  5%  perf-profile.self.cycles-pp.kmem_cache_free
      0.83 ±  3%      -0.1        0.74 ±  5%  perf-profile.self.cycles-pp.read
      0.88 ±  2%      -0.1        0.80 ±  5%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.74 ±  2%      -0.1        0.67 ±  5%  perf-profile.self.cycles-pp.write
      0.70 ±  2%      -0.1        0.64 ±  6%  perf-profile.self.cycles-pp.vfs_read
      0.67 ±  2%      -0.1        0.61 ±  5%  perf-profile.self.cycles-pp.vfs_write
      0.67 ±  2%      -0.1        0.62 ±  4%  perf-profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
      0.52 ±  3%      -0.1        0.47 ±  4%  perf-profile.self.cycles-pp.obj_cgroup_charge
      0.51 ±  2%      -0.0        0.46 ±  5%  perf-profile.self.cycles-pp.kmem_cache_alloc_node_noprof
      0.29 ±  2%      -0.0        0.25 ±  3%  perf-profile.self.cycles-pp.task_mm_cid_work
      0.43 ±  2%      -0.0        0.40 ±  4%  perf-profile.self.cycles-pp.do_syscall_64
      0.34 ±  3%      -0.0        0.31 ±  6%  perf-profile.self.cycles-pp.__skb_datagram_iter
      0.29 ±  2%      -0.0        0.26 ±  6%  perf-profile.self.cycles-pp.__virt_addr_valid
      0.33 ±  2%      -0.0        0.30 ±  6%  perf-profile.self.cycles-pp.__cond_resched
      0.28 ±  3%      -0.0        0.25 ±  5%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.37 ±  2%      -0.0        0.34 ±  6%  perf-profile.self.cycles-pp.__check_object_size
      0.18 ±  5%      -0.0        0.15 ±  6%  perf-profile.self.cycles-pp.__enqueue_entity
      0.21 ±  3%      -0.0        0.18 ±  6%  perf-profile.self.cycles-pp.rcu_all_qs
      0.22 ±  3%      -0.0        0.20 ±  6%  perf-profile.self.cycles-pp.x64_sys_call
      0.19 ±  2%      -0.0        0.17 ±  6%  perf-profile.self.cycles-pp.rw_verify_area
      0.05            +0.0        0.08 ± 23%  perf-profile.self.cycles-pp.update_rq_clock
      0.00            +0.1        0.06 ± 11%  perf-profile.self.cycles-pp.ring_buffer_read_head
      0.00            +0.2        0.15 ± 17%  perf-profile.self.cycles-pp.intel_idle
      0.01 ±282%      +0.2        0.18 ± 70%  perf-profile.self.cycles-pp.available_idle_cpu



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ