[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202406270909.adb09955-oliver.sang@intel.com>
Date: Thu, 27 Jun 2024 10:49:13 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Mateusz Guzik <mjguzik@...il.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, Linux Memory Management List
<linux-mm@...ck.org>, Christian Brauner <brauner@...nel.org>,
<linux-kernel@...r.kernel.org>, <ying.huang@...el.com>,
<feng.tang@...el.com>, <fengwei.yin@...el.com>, <oliver.sang@...el.com>
Subject: [linux-next:master] [vfs] bdf6091183: stress-ng.full.ops_per_sec
633.4% improvement
Hello,
kernel test robot noticed a 633.4% improvement of stress-ng.full.ops_per_sec on:
commit: bdf609118326e7c15f1c7efbc629bd9f7f307231 ("vfs: move d_lockref out of the area used by RCU lookup")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: stress-ng
test machine: 256 threads 2 sockets GENUINE INTEL(R) XEON(R) (Sierra Forest) with 128G memory
parameters:
nr_threads: 100%
testtime: 60s
test: full
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240627/202406270909.adb09955-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-srf-2sp1/full/stress-ng/60s
commit:
d042dae6ad ("lockref: speculatively spin waiting for the lock to be released")
bdf6091183 ("vfs: move d_lockref out of the area used by RCU lookup")
d042dae6ad74df8a bdf609118326e7c15f1c7efbc62
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.24 ± 14% +0.3 0.51 ± 6% mpstat.cpu.all.usr%
783327 ± 4% +12.4% 880472 ± 4% numa-numastat.node1.local_node
516588 ± 9% +15.0% 594316 ± 6% vmstat.system.in
8759 ± 73% +110.7% 18455 ± 41% numa-meminfo.node1.PageTables
841412 ± 11% +18.1% 993556 ± 7% numa-meminfo.node1.Shmem
2183 ± 72% +111.9% 4626 ± 41% numa-vmstat.node1.nr_page_table_pages
210196 ± 11% +18.2% 248382 ± 6% numa-vmstat.node1.nr_shmem
782967 ± 4% +12.4% 879991 ± 4% numa-vmstat.node1.numa_local
244258 ± 5% +21.1% 295853 ± 9% sched_debug.cfs_rq:/.avg_vruntime.stddev
456627 ± 76% -94.3% 26089 ± 6% sched_debug.cfs_rq:/.load.max
244258 ± 5% +21.1% 295853 ± 9% sched_debug.cfs_rq:/.min_vruntime.stddev
7656655 ± 11% +633.4% 56155706 stress-ng.full.ops
127609 ± 11% +633.4% 935926 stress-ng.full.ops_per_sec
59946 +6.6% 63873 ± 4% stress-ng.time.involuntary_context_switches
5.96 ± 11% +597.3% 41.59 stress-ng.time.user_time
1558 ± 7% -86.6% 208.33 ± 6% perf-c2c.DRAM.local
15021 ± 4% +59.5% 23957 ± 3% perf-c2c.DRAM.remote
15399 ± 2% +102.6% 31205 ± 3% perf-c2c.HITM.local
9938 ± 3% +103.4% 20217 ± 4% perf-c2c.HITM.remote
25337 ± 2% +102.9% 51422 ± 3% perf-c2c.HITM.total
16172 ± 32% +162.6% 42464 ± 13% proc-vmstat.numa_hint_faults
14655 ± 34% +82.4% 26726 ± 24% proc-vmstat.numa_hint_faults_local
1428439 +5.2% 1502110 proc-vmstat.numa_hit
1164410 +6.5% 1240512 proc-vmstat.numa_local
169794 ± 14% +32.8% 225458 ± 14% proc-vmstat.numa_pte_updates
185208 +5.9% 196095 ± 4% proc-vmstat.pgactivate
1510415 +4.9% 1584896 proc-vmstat.pgalloc_normal
7.553e+09 ± 11% +42.2% 1.074e+10 ± 7% perf-stat.i.branch-instructions
20529685 ± 22% +58.4% 32511073 ± 12% perf-stat.i.branch-misses
18.77 ± 9% +9.6 28.36 ± 6% perf-stat.i.cache-miss-rate%
5757124 ± 11% +71.2% 9853953 ± 8% perf-stat.i.cache-misses
27469874 ± 9% +23.9% 34036598 ± 7% perf-stat.i.cache-references
2575 ± 2% +6.1% 2732 ± 2% perf-stat.i.context-switches
16.75 ± 8% -24.4% 12.66 ± 4% perf-stat.i.cpi
335.17 ± 2% +5.4% 353.20 perf-stat.i.cpu-migrations
119311 ± 12% -44.0% 66812 ± 5% perf-stat.i.cycles-between-cache-misses
3.106e+10 ± 11% +49.4% 4.64e+10 ± 7% perf-stat.i.instructions
0.19 ± 4% +15.2% 0.22 perf-stat.overall.MPKI
21.65 ± 2% +8.2 29.84 ± 2% perf-stat.overall.cache-miss-rate%
18.46 -28.3% 13.23 perf-stat.overall.cpi
98417 ± 4% -37.9% 61109 perf-stat.overall.cycles-between-cache-misses
0.05 +39.5% 0.08 perf-stat.overall.ipc
7.648e+09 ± 9% +39.7% 1.069e+10 ± 6% perf-stat.ps.branch-instructions
20972501 ± 19% +52.4% 31965991 ± 10% perf-stat.ps.branch-misses
5909643 ± 9% +69.3% 10006290 ± 7% perf-stat.ps.cache-misses
27252734 ± 7% +23.0% 33515970 ± 6% perf-stat.ps.cache-references
2461 +6.3% 2615 perf-stat.ps.context-switches
323.20 +4.6% 338.19 perf-stat.ps.cpu-migrations
3.146e+10 ± 9% +46.7% 4.616e+10 ± 6% perf-stat.ps.instructions
2.154e+12 +38.9% 2.992e+12 perf-stat.total.instructions
24.75 -24.7 0.00 perf-profile.calltrace.cycles-pp.dput.terminate_walk.path_openat.do_filp_open.do_sys_openat2
24.75 -24.7 0.00 perf-profile.calltrace.cycles-pp.terminate_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
24.74 -24.7 0.00 perf-profile.calltrace.cycles-pp.__legitimize_path.try_to_unlazy.complete_walk.do_open.path_openat
24.74 -24.7 0.00 perf-profile.calltrace.cycles-pp.complete_walk.do_open.path_openat.do_filp_open.do_sys_openat2
24.74 -24.7 0.00 perf-profile.calltrace.cycles-pp.try_to_unlazy.complete_walk.do_open.path_openat.do_filp_open
24.74 -24.7 0.00 perf-profile.calltrace.cycles-pp.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk.do_open
24.73 -24.7 0.00 perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
24.71 -24.7 0.00 perf-profile.calltrace.cycles-pp.lockref_get.do_dentry_open.do_open.path_openat.do_filp_open
24.84 -24.2 0.65 ± 9% perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
24.85 -24.2 0.69 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
24.85 -24.2 0.69 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
24.84 -24.2 0.68 ± 9% perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
24.85 -24.1 0.72 ± 8% perf-profile.calltrace.cycles-pp.__close
23.68 -23.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.dput.terminate_walk.path_openat.do_filp_open
23.67 -23.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk
23.67 -23.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.lockref_get.do_dentry_open.do_open.path_openat
23.67 -23.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.dput.__fput.__x64_sys_close.do_syscall_64
23.63 -23.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.dput.terminate_walk.path_openat
23.62 -23.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.lockref_get_not_dead.__legitimize_path.try_to_unlazy
23.62 -23.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.lockref_get.do_dentry_open.do_open
23.62 -23.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.dput.__fput.__x64_sys_close
74.50 +23.3 97.82 perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
74.50 +23.3 97.82 perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
74.52 +23.3 97.84 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
74.52 +23.3 97.84 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
74.41 +23.3 97.74 perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
74.41 +23.3 97.75 perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
74.52 +23.4 97.88 perf-profile.calltrace.cycles-pp.open64
49.65 +47.5 97.18 perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
24.83 +72.0 96.82 perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2
0.00 +96.0 95.99 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.chrdev_open.do_dentry_open.do_open
0.00 +96.2 96.18 perf-profile.calltrace.cycles-pp._raw_spin_lock.chrdev_open.do_dentry_open.do_open.path_openat
0.00 +96.3 96.34 perf-profile.calltrace.cycles-pp.chrdev_open.do_dentry_open.do_open.path_openat.do_filp_open
49.48 -48.8 0.65 ± 13% perf-profile.children.cycles-pp.dput
24.71 -24.5 0.22 ± 12% perf-profile.children.cycles-pp.lockref_get
24.74 -24.4 0.31 ± 10% perf-profile.children.cycles-pp.lockref_get_not_dead
24.74 -24.4 0.32 ± 10% perf-profile.children.cycles-pp.__legitimize_path
24.74 -24.4 0.32 ± 10% perf-profile.children.cycles-pp.complete_walk
24.74 -24.4 0.32 ± 10% perf-profile.children.cycles-pp.try_to_unlazy
24.75 -24.4 0.34 ± 12% perf-profile.children.cycles-pp.terminate_walk
24.84 -24.2 0.65 ± 9% perf-profile.children.cycles-pp.__fput
24.84 -24.2 0.68 ± 9% perf-profile.children.cycles-pp.__x64_sys_close
24.85 -24.1 0.73 ± 8% perf-profile.children.cycles-pp.__close
2.13 ± 6% -1.5 0.65 ± 13% perf-profile.children.cycles-pp.lockref_put_return
99.79 -0.4 99.40 perf-profile.children.cycles-pp.do_syscall_64
99.80 -0.4 99.42 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.23 ± 2% +0.0 0.25 perf-profile.children.cycles-pp.ksys_write
0.08 ± 5% +0.0 0.13 ± 2% perf-profile.children.cycles-pp.apparmor_file_free_security
0.08 ± 5% +0.0 0.13 ± 2% perf-profile.children.cycles-pp.security_file_free
0.00 +0.1 0.05 perf-profile.children.cycles-pp.stress_full
0.02 ±141% +0.1 0.07 perf-profile.children.cycles-pp.__x64_sys_pread64
0.26 +0.1 0.32 ± 2% perf-profile.children.cycles-pp.write
0.02 ± 99% +0.1 0.09 ± 4% perf-profile.children.cycles-pp.__do_sys_newfstatat
0.02 ±141% +0.1 0.08 perf-profile.children.cycles-pp.ksys_read
0.08 ± 5% +0.1 0.15 ± 2% perf-profile.children.cycles-pp.vfs_read
0.05 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.__libc_pread
0.05 +0.1 0.13 ± 2% perf-profile.children.cycles-pp.read
0.05 +0.1 0.13 ± 2% perf-profile.children.cycles-pp.fstatat64
0.00 +0.1 0.08 ± 4% perf-profile.children.cycles-pp.mas_rev_awalk
0.08 ± 6% +0.1 0.17 ± 4% perf-profile.children.cycles-pp.apparmor_file_open
0.08 ± 6% +0.1 0.18 ± 4% perf-profile.children.cycles-pp.security_file_open
0.00 +0.1 0.10 perf-profile.children.cycles-pp.iov_iter_zero
0.00 +0.1 0.10 ± 3% perf-profile.children.cycles-pp.read_iter_zero
0.00 +0.1 0.11 ± 3% perf-profile.children.cycles-pp.ioctl
0.00 +0.1 0.12 ± 4% perf-profile.children.cycles-pp.mas_empty_area_rev
0.00 +0.1 0.14 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.00 +0.1 0.15 ± 3% perf-profile.children.cycles-pp.apparmor_file_alloc_security
0.00 +0.1 0.15 ± 4% perf-profile.children.cycles-pp.kobject_get_unless_zero
0.00 +0.2 0.16 ± 3% perf-profile.children.cycles-pp.security_file_alloc
0.00 +0.2 0.16 ± 2% perf-profile.children.cycles-pp.init_file
0.00 +0.2 0.17 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.00 +0.2 0.17 ± 2% perf-profile.children.cycles-pp.vm_unmapped_area
0.00 +0.2 0.18 ± 10% perf-profile.children.cycles-pp.cdev_put
0.00 +0.2 0.18 ± 10% perf-profile.children.cycles-pp.kobject_put
0.00 +0.2 0.19 perf-profile.children.cycles-pp.alloc_empty_file
0.00 +0.2 0.19 perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown_vmflags
0.00 +0.2 0.20 ± 2% perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
0.00 +0.2 0.20 perf-profile.children.cycles-pp.__get_unmapped_area
0.00 +0.2 0.21 ± 2% perf-profile.children.cycles-pp.do_mmap
0.02 ± 99% +0.3 0.29 perf-profile.children.cycles-pp.vm_mmap_pgoff
0.02 ± 99% +0.3 0.31 perf-profile.children.cycles-pp.ksys_mmap_pgoff
0.06 ± 9% +0.3 0.40 perf-profile.children.cycles-pp.__mmap
94.70 +1.5 96.19 perf-profile.children.cycles-pp._raw_spin_lock
94.51 +1.5 96.01 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
74.50 +23.3 97.82 perf-profile.children.cycles-pp.__x64_sys_openat
74.50 +23.3 97.82 perf-profile.children.cycles-pp.do_sys_openat2
74.41 +23.3 97.74 perf-profile.children.cycles-pp.path_openat
74.41 +23.3 97.75 perf-profile.children.cycles-pp.do_filp_open
74.52 +23.4 97.89 perf-profile.children.cycles-pp.open64
49.65 +47.5 97.18 perf-profile.children.cycles-pp.do_open
24.83 +72.0 96.82 perf-profile.children.cycles-pp.do_dentry_open
0.00 +96.3 96.34 perf-profile.children.cycles-pp.chrdev_open
2.12 ± 6% -1.5 0.64 ± 13% perf-profile.self.cycles-pp.lockref_put_return
1.04 ± 7% -0.8 0.22 ± 12% perf-profile.self.cycles-pp.lockref_get
1.06 ± 6% -0.7 0.31 ± 10% perf-profile.self.cycles-pp.lockref_get_not_dead
0.08 ± 5% +0.0 0.13 ± 2% perf-profile.self.cycles-pp.apparmor_file_free_security
0.00 +0.1 0.05 perf-profile.self.cycles-pp.stress_full
0.00 +0.1 0.07 perf-profile.self.cycles-pp.mas_rev_awalk
0.00 +0.1 0.08 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.00 +0.1 0.09 perf-profile.self.cycles-pp.do_dentry_open
0.08 ± 6% +0.1 0.17 ± 4% perf-profile.self.cycles-pp.apparmor_file_open
0.00 +0.1 0.10 perf-profile.self.cycles-pp.iov_iter_zero
0.00 +0.1 0.14 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.00 +0.1 0.15 ± 3% perf-profile.self.cycles-pp.apparmor_file_alloc_security
0.00 +0.1 0.15 ± 4% perf-profile.self.cycles-pp.kobject_get_unless_zero
0.00 +0.2 0.18 ± 10% perf-profile.self.cycles-pp.kobject_put
94.04 +1.5 95.52 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists