[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202505131609.20984254-lkp@intel.com>
Date: Wed, 14 May 2025 10:33:43 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
<x86@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
<linux-mm@...ck.org>, <oliver.sang@...el.com>
Subject: [tip:locking/futex] [futex] bd54df5ea7:
will-it-scale.per_thread_ops 33.9% improvement
Hello,
kernel test robot noticed a 33.9% improvement of will-it-scale.per_thread_ops on:
commit: bd54df5ea7cadac520e346d5f0fe5d58e635b6ba ("futex: Allow to resize the private local hash")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git locking/futex
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory
parameters:
nr_task: 100%
mode: thread
test: pthread_mutex5
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250513/202505131609.20984254-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-gnr-2sp3/pthread_mutex5/will-it-scale
commit:
7c4f75a21f ("futex: Allow automatic allocation of process wide futex hash")
bd54df5ea7 ("futex: Allow to resize the private local hash")
7c4f75a21f636486 bd54df5ea7cadac520e346d5f0f
---------------- ---------------------------
%stddev %change %stddev
\ | \
23570282 -32.6% 15883630 ± 2% cpuidle..usage
1862635 -9.3% 1689404 meminfo.Shmem
2110 +19.0% 2512 ± 3% perf-c2c.DRAM.local
0.16 ± 4% -0.1 0.08 ± 4% mpstat.cpu.all.soft%
0.63 -0.2 0.46 ± 3% mpstat.cpu.all.usr%
1264859 ± 2% -47.5% 664434 ± 62% numa-vmstat.node1.nr_file_pages
38897 ± 10% -47.8% 20323 ± 48% numa-vmstat.node1.nr_mapped
206687 -33.5% 137401 ± 2% vmstat.system.cs
427708 -8.0% 393532 vmstat.system.in
5060133 ± 2% -47.5% 2658326 ± 62% numa-meminfo.node1.FilePages
158778 ± 10% -48.5% 81837 ± 46% numa-meminfo.node1.Mapped
6620342 ± 2% -38.3% 4086741 ± 37% numa-meminfo.node1.MemUsed
9566224 +33.9% 12810946 will-it-scale.256.threads
0.18 -11.1% 0.16 will-it-scale.256.threads_idle
37367 +33.9% 50042 will-it-scale.per_thread_ops
9566224 +33.9% 12810946 will-it-scale.workload
0.00 ± 15% +29.7% 0.00 ± 15% sched_debug.cpu.next_balance.stddev
124704 -33.5% 82964 ± 2% sched_debug.cpu.nr_switches.avg
230832 ± 52% -38.2% 142628 ± 5% sched_debug.cpu.nr_switches.max
98911 ± 4% -33.7% 65543 ± 3% sched_debug.cpu.nr_switches.min
17307 ± 60% -47.4% 9105 ± 20% sched_debug.cpu.nr_switches.stddev
672002 -6.5% 628169 proc-vmstat.nr_active_anon
1345624 -3.2% 1302363 proc-vmstat.nr_file_pages
41725 ± 7% -16.3% 34939 ± 12% proc-vmstat.nr_mapped
465688 -9.3% 422425 proc-vmstat.nr_shmem
672002 -6.5% 628169 proc-vmstat.nr_zone_active_anon
1956811 -2.5% 1908264 proc-vmstat.numa_hit
1692181 -2.8% 1644262 proc-vmstat.numa_local
0.20 +4.3% 0.21 perf-stat.i.MPKI
0.05 -0.0 0.05 perf-stat.i.branch-miss-rate%
9101814 -10.3% 8161953 perf-stat.i.branch-misses
14404131 +3.7% 14939924 perf-stat.i.cache-misses
207911 -33.5% 138184 ± 2% perf-stat.i.context-switches
65204 -4.0% 62625 perf-stat.i.cycles-between-cache-misses
0.01 -95.2% 0.00 ±223% perf-stat.i.metric.K/sec
0.20 +4.2% 0.21 perf-stat.overall.MPKI
0.05 -0.0 0.05 perf-stat.overall.branch-miss-rate%
63438 -3.5% 61223 perf-stat.overall.cycles-between-cache-misses
2250086 -25.7% 1671327 perf-stat.overall.path-length
9086343 -10.4% 8139691 perf-stat.ps.branch-misses
14400345 +3.6% 14922252 perf-stat.ps.cache-misses
207422 -33.5% 137839 ± 2% perf-stat.ps.context-switches
0.16 +99.2% 0.32 ± 95% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
1.66 ± 12% +17.5% 1.95 ± 3% perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
0.08 ± 8% +37.8% 0.12 ± 20% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.01 ± 12% +47.5% 0.01 ± 5% perf-sched.sch_delay.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
0.09 ±166% +1763.7% 1.74 ± 65% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
0.09 +16.3% 0.11 ± 3% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.98 ± 14% +28.2% 3.83 ± 4% perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
0.18 ± 5% +248.1% 0.61 ± 63% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.15 ±186% +1714.0% 2.76 ± 49% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
0.01 ± 12% +45.3% 0.02 ± 5% perf-sched.total_sch_delay.average.ms
2.91 ± 2% +61.4% 4.69 ± 4% perf-sched.total_wait_and_delay.average.ms
556081 ± 2% -37.0% 350186 ± 2% perf-sched.total_wait_and_delay.count.ms
2.89 ± 2% +61.5% 4.67 ± 4% perf-sched.total_wait_time.average.ms
0.01 ± 6% +35.6% 0.02 ± 3% perf-sched.wait_and_delay.avg.ms.futex_do_wait.__futex_wait.futex_wait.do_futex
18.90 ± 3% -15.5% 15.98 perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
541651 ± 2% -37.0% 341352 ± 2% perf-sched.wait_and_delay.count.futex_do_wait.__futex_wait.futex_wait.do_futex
11.50 ± 18% -84.1% 1.83 ±223% perf-sched.wait_and_delay.count.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
253.67 ± 3% +17.1% 297.00 perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.09 ±166% +1763.7% 1.74 ± 65% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
18.79 ± 3% -15.6% 15.85 perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.15 ±186% +1714.0% 2.76 ± 49% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
43.55 -1.5 42.06 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex
43.54 -1.5 42.04 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.futex_wait_setup.__futex_wait.futex_wait
43.83 -1.3 42.54 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
43.83 -1.3 42.54 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
43.76 -1.3 42.48 perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex
99.06 +0.2 99.25 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
99.05 +0.2 99.24 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
99.03 +0.2 99.22 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
99.02 +0.2 99.22 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
54.99 +1.1 56.14 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.futex_wake.do_futex.__x64_sys_futex
55.02 +1.2 56.21 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
55.19 +1.5 56.68 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
43.83 -1.3 42.54 perf-profile.children.cycles-pp.__futex_wait
43.83 -1.3 42.54 perf-profile.children.cycles-pp.futex_wait
43.76 -1.3 42.48 perf-profile.children.cycles-pp.futex_wait_setup
98.55 -0.3 98.21 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
98.59 -0.3 98.28 perf-profile.children.cycles-pp._raw_spin_lock
0.37 -0.1 0.26 perf-profile.children.cycles-pp.pthread_mutex_lock
0.60 ± 3% -0.1 0.49 ± 3% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.58 ± 3% -0.1 0.47 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.20 ± 5% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.handle_softirqs
0.18 ± 5% -0.1 0.09 ± 6% perf-profile.children.cycles-pp.sched_balance_domains
0.21 ± 4% -0.1 0.12 ± 4% perf-profile.children.cycles-pp.__irq_exit_rcu
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.common_startup_64
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.cpu_startup_entry
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.do_idle
0.17 ± 2% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.start_secondary
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.acpi_idle_do_entry
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.acpi_idle_enter
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.acpi_safe_halt
0.11 ± 4% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.pv_native_safe_halt
0.11 ± 4% -0.0 0.08 perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.10 -0.0 0.07 ± 5% perf-profile.children.cycles-pp.__schedule
0.11 -0.0 0.08 ± 4% perf-profile.children.cycles-pp.cpuidle_enter
0.06 ± 7% -0.0 0.03 ± 70% perf-profile.children.cycles-pp.futex_do_wait
0.11 ± 3% -0.0 0.08 ± 4% perf-profile.children.cycles-pp.cpuidle_enter_state
0.11 -0.0 0.08 perf-profile.children.cycles-pp.cpuidle_idle_call
0.08 -0.0 0.05 ± 7% perf-profile.children.cycles-pp.sysvec_call_function_single
0.00 +0.1 0.05 perf-profile.children.cycles-pp.futex_q_unlock
0.07 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.futex_q_lock
0.00 +0.2 0.17 perf-profile.children.cycles-pp.futex_hash_put
99.22 +0.2 99.40 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
99.22 +0.2 99.40 perf-profile.children.cycles-pp.do_syscall_64
99.03 +0.2 99.22 perf-profile.children.cycles-pp.__x64_sys_futex
99.02 +0.2 99.22 perf-profile.children.cycles-pp.do_futex
0.00 +0.3 0.33 perf-profile.children.cycles-pp.futex_hash
55.19 +1.5 56.68 perf-profile.children.cycles-pp.futex_wake
97.95 -0.2 97.71 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.37 -0.1 0.26 perf-profile.self.cycles-pp.pthread_mutex_lock
0.18 ± 4% -0.1 0.09 ± 6% perf-profile.self.cycles-pp.sched_balance_domains
0.08 -0.0 0.06 perf-profile.self.cycles-pp.futex_wait_setup
0.07 +0.0 0.12 perf-profile.self.cycles-pp.futex_q_lock
0.00 +0.1 0.05 perf-profile.self.cycles-pp.futex_q_unlock
0.00 +0.1 0.08 perf-profile.self.cycles-pp._raw_spin_lock
0.00 +0.2 0.17 perf-profile.self.cycles-pp.futex_hash_put
0.00 +0.3 0.33 perf-profile.self.cycles-pp.futex_hash
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists