lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202505131609.20984254-lkp@intel.com>
Date: Wed, 14 May 2025 10:33:43 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	<x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
	<linux-mm@...ck.org>, <oliver.sang@...el.com>
Subject: [tip:locking/futex] [futex]  bd54df5ea7:
 will-it-scale.per_thread_ops 33.9% improvement



Hello,

kernel test robot noticed a 33.9% improvement of will-it-scale.per_thread_ops on:


commit: bd54df5ea7cadac520e346d5f0fe5d58e635b6ba ("futex: Allow to resize the private local hash")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git locking/futex


testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P  CPU @ 2.4GHz (Granite Rapids) with 256G memory
parameters:

	nr_task: 100%
	mode: thread
	test: pthread_mutex5
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250513/202505131609.20984254-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-gnr-2sp3/pthread_mutex5/will-it-scale

commit: 
  7c4f75a21f ("futex: Allow automatic allocation of process wide futex hash")
  bd54df5ea7 ("futex: Allow to resize the private local hash")

7c4f75a21f636486 bd54df5ea7cadac520e346d5f0f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  23570282           -32.6%   15883630 ±  2%  cpuidle..usage
   1862635            -9.3%    1689404        meminfo.Shmem
      2110           +19.0%       2512 ±  3%  perf-c2c.DRAM.local
      0.16 ±  4%      -0.1        0.08 ±  4%  mpstat.cpu.all.soft%
      0.63            -0.2        0.46 ±  3%  mpstat.cpu.all.usr%
   1264859 ±  2%     -47.5%     664434 ± 62%  numa-vmstat.node1.nr_file_pages
     38897 ± 10%     -47.8%      20323 ± 48%  numa-vmstat.node1.nr_mapped
    206687           -33.5%     137401 ±  2%  vmstat.system.cs
    427708            -8.0%     393532        vmstat.system.in
   5060133 ±  2%     -47.5%    2658326 ± 62%  numa-meminfo.node1.FilePages
    158778 ± 10%     -48.5%      81837 ± 46%  numa-meminfo.node1.Mapped
   6620342 ±  2%     -38.3%    4086741 ± 37%  numa-meminfo.node1.MemUsed
   9566224           +33.9%   12810946        will-it-scale.256.threads
      0.18           -11.1%       0.16        will-it-scale.256.threads_idle
     37367           +33.9%      50042        will-it-scale.per_thread_ops
   9566224           +33.9%   12810946        will-it-scale.workload
      0.00 ± 15%     +29.7%       0.00 ± 15%  sched_debug.cpu.next_balance.stddev
    124704           -33.5%      82964 ±  2%  sched_debug.cpu.nr_switches.avg
    230832 ± 52%     -38.2%     142628 ±  5%  sched_debug.cpu.nr_switches.max
     98911 ±  4%     -33.7%      65543 ±  3%  sched_debug.cpu.nr_switches.min
     17307 ± 60%     -47.4%       9105 ± 20%  sched_debug.cpu.nr_switches.stddev
    672002            -6.5%     628169        proc-vmstat.nr_active_anon
   1345624            -3.2%    1302363        proc-vmstat.nr_file_pages
     41725 ±  7%     -16.3%      34939 ± 12%  proc-vmstat.nr_mapped
    465688            -9.3%     422425        proc-vmstat.nr_shmem
    672002            -6.5%     628169        proc-vmstat.nr_zone_active_anon
   1956811            -2.5%    1908264        proc-vmstat.numa_hit
   1692181            -2.8%    1644262        proc-vmstat.numa_local
      0.20            +4.3%       0.21        perf-stat.i.MPKI
      0.05            -0.0        0.05        perf-stat.i.branch-miss-rate%
   9101814           -10.3%    8161953        perf-stat.i.branch-misses
  14404131            +3.7%   14939924        perf-stat.i.cache-misses
    207911           -33.5%     138184 ±  2%  perf-stat.i.context-switches
     65204            -4.0%      62625        perf-stat.i.cycles-between-cache-misses
      0.01           -95.2%       0.00 ±223%  perf-stat.i.metric.K/sec
      0.20            +4.2%       0.21        perf-stat.overall.MPKI
      0.05            -0.0        0.05        perf-stat.overall.branch-miss-rate%
     63438            -3.5%      61223        perf-stat.overall.cycles-between-cache-misses
   2250086           -25.7%    1671327        perf-stat.overall.path-length
   9086343           -10.4%    8139691        perf-stat.ps.branch-misses
  14400345            +3.6%   14922252        perf-stat.ps.cache-misses
    207422           -33.5%     137839 ±  2%  perf-stat.ps.context-switches
      0.16           +99.2%       0.32 ± 95%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      1.66 ± 12%     +17.5%       1.95 ±  3%  perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.08 ±  8%     +37.8%       0.12 ± 20%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 12%     +47.5%       0.01 ±  5%  perf-sched.sch_delay.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
      0.09 ±166%   +1763.7%       1.74 ± 65%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.09           +16.3%       0.11 ±  3%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2.98 ± 14%     +28.2%       3.83 ±  4%  perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      0.18 ±  5%    +248.1%       0.61 ± 63%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.15 ±186%   +1714.0%       2.76 ± 49%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.01 ± 12%     +45.3%       0.02 ±  5%  perf-sched.total_sch_delay.average.ms
      2.91 ±  2%     +61.4%       4.69 ±  4%  perf-sched.total_wait_and_delay.average.ms
    556081 ±  2%     -37.0%     350186 ±  2%  perf-sched.total_wait_and_delay.count.ms
      2.89 ±  2%     +61.5%       4.67 ±  4%  perf-sched.total_wait_time.average.ms
      0.01 ±  6%     +35.6%       0.02 ±  3%  perf-sched.wait_and_delay.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
     18.90 ±  3%     -15.5%      15.98        perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    541651 ±  2%     -37.0%     341352 ±  2%  perf-sched.wait_and_delay.count.futex_do_wait.__futex_wait.futex_wait.do_futex
     11.50 ± 18%     -84.1%       1.83 ±223%  perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
    253.67 ±  3%     +17.1%     297.00        perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.09 ±166%   +1763.7%       1.74 ± 65%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     18.79 ±  3%     -15.6%      15.85        perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.15 ±186%   +1714.0%       2.76 ± 49%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
     43.55            -1.5       42.06        perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex
     43.54            -1.5       42.04        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.futex_wait_setup.__futex_wait.futex_wait
     43.83            -1.3       42.54        perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
     43.83            -1.3       42.54        perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     43.76            -1.3       42.48        perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex
     99.06            +0.2       99.25        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
     99.05            +0.2       99.24        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
     99.03            +0.2       99.22        perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     99.02            +0.2       99.22        perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     54.99            +1.1       56.14        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.futex_wake.do_futex.__x64_sys_futex
     55.02            +1.2       56.21        perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
     55.19            +1.5       56.68        perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     43.83            -1.3       42.54        perf-profile.children.cycles-pp.__futex_wait
     43.83            -1.3       42.54        perf-profile.children.cycles-pp.futex_wait
     43.76            -1.3       42.48        perf-profile.children.cycles-pp.futex_wait_setup
     98.55            -0.3       98.21        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     98.59            -0.3       98.28        perf-profile.children.cycles-pp._raw_spin_lock
      0.37            -0.1        0.26        perf-profile.children.cycles-pp.pthread_mutex_lock
      0.60 ±  3%      -0.1        0.49 ±  3%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.58 ±  3%      -0.1        0.47 ±  3%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.20 ±  5%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.handle_softirqs
      0.18 ±  5%      -0.1        0.09 ±  6%  perf-profile.children.cycles-pp.sched_balance_domains
      0.21 ±  4%      -0.1        0.12 ±  4%  perf-profile.children.cycles-pp.__irq_exit_rcu
      0.17 ±  2%      -0.1        0.11 ±  3%  perf-profile.children.cycles-pp.common_startup_64
      0.17 ±  2%      -0.1        0.11 ±  3%  perf-profile.children.cycles-pp.cpu_startup_entry
      0.17 ±  2%      -0.1        0.11 ±  3%  perf-profile.children.cycles-pp.do_idle
      0.17 ±  2%      -0.1        0.11 ±  3%  perf-profile.children.cycles-pp.start_secondary
      0.11 ±  4%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.acpi_idle_do_entry
      0.11 ±  4%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.acpi_idle_enter
      0.11 ±  4%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.acpi_safe_halt
      0.11 ±  4%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.pv_native_safe_halt
      0.11 ±  4%      -0.0        0.08        perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.10            -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.__schedule
      0.11            -0.0        0.08 ±  4%  perf-profile.children.cycles-pp.cpuidle_enter
      0.06 ±  7%      -0.0        0.03 ± 70%  perf-profile.children.cycles-pp.futex_do_wait
      0.11 ±  3%      -0.0        0.08 ±  4%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.11            -0.0        0.08        perf-profile.children.cycles-pp.cpuidle_idle_call
      0.08            -0.0        0.05 ±  7%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.futex_q_unlock
      0.07            +0.1        0.12 ±  3%  perf-profile.children.cycles-pp.futex_q_lock
      0.00            +0.2        0.17        perf-profile.children.cycles-pp.futex_hash_put
     99.22            +0.2       99.40        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     99.22            +0.2       99.40        perf-profile.children.cycles-pp.do_syscall_64
     99.03            +0.2       99.22        perf-profile.children.cycles-pp.__x64_sys_futex
     99.02            +0.2       99.22        perf-profile.children.cycles-pp.do_futex
      0.00            +0.3        0.33        perf-profile.children.cycles-pp.futex_hash
     55.19            +1.5       56.68        perf-profile.children.cycles-pp.futex_wake
     97.95            -0.2       97.71        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.37            -0.1        0.26        perf-profile.self.cycles-pp.pthread_mutex_lock
      0.18 ±  4%      -0.1        0.09 ±  6%  perf-profile.self.cycles-pp.sched_balance_domains
      0.08            -0.0        0.06        perf-profile.self.cycles-pp.futex_wait_setup
      0.07            +0.0        0.12        perf-profile.self.cycles-pp.futex_q_lock
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.futex_q_unlock
      0.00            +0.1        0.08        perf-profile.self.cycles-pp._raw_spin_lock
      0.00            +0.2        0.17        perf-profile.self.cycles-pp.futex_hash_put
      0.00            +0.3        0.33        perf-profile.self.cycles-pp.futex_hash




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ