lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202412241607.dc13db91-lkp@intel.com>
Date: Tue, 24 Dec 2024 16:34:05 +0800
From: kernel test robot <oliver.sang@...el.com>
To: K Prateek Nayak <kprateek.nayak@....com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>, Julia Lawall <julia.lawall@...ia.fr>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	<aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>, <oliver.sang@...el.com>
Subject: [linus:master] [sched/core]  e932c4ab38:
 aim9.sync_disk_cp.ops_per_sec 2.3% improvement



Hello,

kernel test robot noticed a 2.3% improvement of aim9.sync_disk_cp.ops_per_sec on:


commit: e932c4ab38f072ce5894b2851fea8bc5754bb8e5 ("sched/core: Prevent wakeup of ksoftirqd during idle load balance")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


testcase: aim9
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz (Skylake) with 16G memory
parameters:

	testtime: 300s
	test: sync_disk_cp
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-----------------------------------------------------------------------------+
| testcase: change | vm-scalability: vm-scalability.throughput 2.4% improvement                  |
| test machine     | 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory |
| test parameters  | cpufreq_governor=performance                                                |
|                  | runtime=300s                                                                |
|                  | test=migrate                                                                |
+------------------+-----------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241224/202412241607.dc13db91-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/lkp-skl-d06/sync_disk_cp/aim9/300s

commit: 
  ff47a0acfc ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy")
  e932c4ab38 ("sched/core: Prevent wakeup of ksoftirqd during idle load balance")

ff47a0acfcce309c e932c4ab38f072ce5894b2851fe 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    779244            +2.3%     797195        aim9.sync_disk_cp.ops_per_sec
    444185 ±  2%     -51.7%     214738 ±  3%  cpuidle..usage
     40.83 ± 15%     -84.5%       6.33 ± 23%  perf-c2c.HITM.local
   6505472 ± 12%     +21.6%    7908010 ±  4%  meminfo.DirectMap2M
     29200           -10.3%      26194        meminfo.Shmem
      0.08 ±  2%      -0.0        0.06 ±  2%  mpstat.cpu.all.irq%
      0.04 ±  3%      -0.0        0.03 ±  4%  mpstat.cpu.all.soft%
      2562 ±  2%     -60.3%       1018        vmstat.system.cs
      2343           -23.3%       1798        vmstat.system.in
    117335           -53.2%      54952        sched_debug.cpu.nr_switches.avg
    285639 ±  5%     -71.9%      80403 ±  5%  sched_debug.cpu.nr_switches.max
    100396 ±  9%     -77.1%      22968 ± 14%  sched_debug.cpu.nr_switches.stddev
      7316           -10.5%       6550        proc-vmstat.nr_shmem
  58767234            +2.4%   60172860        proc-vmstat.numa_hit
  58984855            +2.0%   60176451        proc-vmstat.numa_local
  58862408            +2.3%   60212415        proc-vmstat.pgalloc_normal
  58848231            +2.3%   60198260        proc-vmstat.pgfree
 7.448e+08            +1.7%  7.574e+08        perf-stat.i.branch-instructions
      1.35            -0.1        1.29        perf-stat.i.branch-miss-rate%
  65562189 ±  2%      -4.9%   62378502        perf-stat.i.cache-references
      2571 ±  2%     -60.5%       1016        perf-stat.i.context-switches
 3.732e+09            +1.8%  3.797e+09        perf-stat.i.instructions
      0.14 ±  3%     -87.0%       0.02        perf-stat.i.metric.K/sec
 7.426e+08            +1.7%   7.55e+08        perf-stat.ps.branch-instructions
  65356430 ±  2%      -4.9%   62171508        perf-stat.ps.cache-references
      2563 ±  2%     -60.5%       1012        perf-stat.ps.context-switches
  3.72e+09            +1.7%  3.785e+09        perf-stat.ps.instructions
  1.12e+12            +1.8%   1.14e+12        perf-stat.total.instructions
      0.02 ± 25%     +78.4%       0.03 ± 18%  perf-sched.sch_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ± 55%     +82.3%       0.04 ± 16%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all
      0.04 ± 21%     +87.3%       0.07 ± 21%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ±  9%     +35.2%       0.02 ±  6%  perf-sched.total_sch_delay.average.ms
     20.34 ±  5%    +111.1%      42.94        perf-sched.total_wait_and_delay.average.ms
      7025 ±  6%     -54.0%       3228        perf-sched.total_wait_and_delay.count.ms
      3058 ± 20%     +63.5%       4998        perf-sched.total_wait_and_delay.max.ms
     20.33 ±  5%    +111.1%      42.92        perf-sched.total_wait_time.average.ms
      3058 ± 20%     +63.5%       4998        perf-sched.total_wait_time.max.ms
    202.58 ± 18%     +94.7%     394.49 ±  9%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    609.98 ±  5%     -17.9%     500.63        perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      9.01 ± 12%   +6133.8%     561.38 ± 15%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3837 ± 12%     -98.6%      52.17 ± 12%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1349 ± 39%    +270.4%       4998        perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      2785 ± 16%     -64.1%       1001        perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    202.50 ± 18%     +94.8%     394.38 ±  9%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    609.95 ±  5%     -17.9%     500.56        perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      9.00 ± 12%   +6140.7%     561.36 ± 15%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1349 ± 39%    +270.4%       4998        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      2785 ± 16%     -64.1%       1001        perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      1.51 ±  6%      -0.9        0.64 ± 11%  perf-profile.calltrace.cycles-pp.common_startup_64
      1.51 ±  6%      -0.9        0.64 ± 11%  perf-profile.children.cycles-pp.common_startup_64
      1.51 ±  6%      -0.9        0.64 ± 11%  perf-profile.children.cycles-pp.cpu_startup_entry
      1.51 ±  6%      -0.9        0.64 ± 11%  perf-profile.children.cycles-pp.do_idle
      1.12 ±  6%      -0.6        0.49 ± 13%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.92 ±  5%      -0.5        0.42 ± 17%  perf-profile.children.cycles-pp.cpuidle_enter
      0.92 ±  5%      -0.5        0.42 ± 17%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.50 ±  6%      -0.3        0.21 ± 12%  perf-profile.children.cycles-pp.intel_idle
      0.52 ±  8%      -0.2        0.33 ±  7%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.48 ±  6%      -0.2        0.31 ±  7%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.27 ± 18%      -0.2        0.10 ± 36%  perf-profile.children.cycles-pp.__schedule
      0.20 ± 12%      -0.2        0.04 ± 73%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.21 ± 13%      -0.2        0.06 ± 20%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      0.24 ±  9%      -0.1        0.11 ± 12%  perf-profile.children.cycles-pp.ret_from_fork
      0.24 ±  9%      -0.1        0.11 ± 12%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.24 ±  9%      -0.1        0.11 ± 10%  perf-profile.children.cycles-pp.kthread
      0.18 ±  8%      -0.1        0.05 ± 49%  perf-profile.children.cycles-pp.schedule
      0.31 ±  9%      -0.1        0.19 ±  8%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.30 ±  9%      -0.1        0.19 ±  8%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.25 ±  8%      -0.1        0.16 ±  7%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.11 ± 11%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.try_to_block_task
      0.10 ± 13%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.dequeue_task_fair
      0.21 ± 12%      -0.1        0.14 ±  5%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.10 ± 14%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.dequeue_entities
      0.17 ± 13%      -0.1        0.10 ±  4%  perf-profile.children.cycles-pp.update_process_times
      0.11 ± 12%      -0.0        0.07 ±  6%  perf-profile.children.cycles-pp.sched_tick
     40.09            +0.6       40.66        perf-profile.children.cycles-pp.read
      0.50 ±  6%      -0.3        0.21 ± 12%  perf-profile.self.cycles-pp.intel_idle
      0.97 ±  4%      +0.1        1.05 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack


***************************************************************************************************
lkp-skl-d03: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/lkp-skl-d03/migrate/vm-scalability

commit: 
  ff47a0acfc ("sched/fair: Check idle_cpu() before need_resched() to detect ilb CPU turning busy")
  e932c4ab38 ("sched/core: Prevent wakeup of ksoftirqd during idle load balance")

ff47a0acfcce309c e932c4ab38f072ce5894b2851fe 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    181821           -12.5%     159050        meminfo.Mapped
      0.02 ±  4%     -20.4%       0.01 ±  5%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     45923           -12.8%      40022        proc-vmstat.nr_mapped
      1.00 ± 99%    -100.0%       0.00 ± 52%  vm-scalability.free_time
   2422987            +2.4%    2480833        vm-scalability.median
   2422987            +2.4%    2480833        vm-scalability.throughput
     90071            +2.5%      92323        vm-scalability.time.involuntary_context_switches
      3.03 ±  3%      -0.2        2.84 ±  3%  perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
      2.84 ±  2%      -0.2        2.67 ±  3%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
      6.04 ±  2%      -0.2        5.88        perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
      6.06 ±  2%      -0.2        5.89        perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      2.78 ±  2%      -0.1        2.64 ±  2%  perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault
      2.90            -0.1        2.77 ±  2%  perf-profile.calltrace.cycles-pp.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.90 ±  4%      +0.1        0.95 ±  3%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.92 ±  3%      +0.1        0.99 ±  2%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.77 ±  4%      +0.1        0.84 ±  3%  perf-profile.calltrace.cycles-pp.__mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.80 ±  7%      +0.1        0.91 ±  7%  perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      4.90 ±  2%      -0.2        4.70 ±  2%  perf-profile.children.cycles-pp.do_read_fault
      6.09 ±  2%      -0.2        5.92        perf-profile.children.cycles-pp.exit_mm
      0.54 ±  2%      -0.1        0.49 ±  8%  perf-profile.children.cycles-pp.___perf_sw_event
      0.39 ±  5%      -0.0        0.35 ±  6%  perf-profile.children.cycles-pp.vfs_open
      0.20 ±  4%      -0.0        0.16 ± 10%  perf-profile.children.cycles-pp.opendir
      0.15 ±  8%      +0.0        0.19 ±  5%  perf-profile.children.cycles-pp.__kmalloc_cache_noprof
      0.18 ±  6%      +0.0        0.22 ± 11%  perf-profile.children.cycles-pp.__kernel_read
      0.29 ±  5%      +0.0        0.34 ±  5%  perf-profile.children.cycles-pp.filemap_read
      1.17 ±  4%      +0.1        1.28 ±  4%  perf-profile.children.cycles-pp.__memcg_slab_free_hook
      0.44 ±  4%      -0.0        0.40 ±  8%  perf-profile.self.cycles-pp.___perf_sw_event
      0.07 ± 15%      -0.0        0.04 ± 71%  perf-profile.self.cycles-pp.__folio_batch_add_and_move





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ