[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAIJc+VxVPQC2Jqb@xsang-OptiPlex-9020>
Date: Fri, 18 Apr 2025 16:12:35 +0800
From: Oliver Sang <oliver.sang@...el.com>
To: Mateusz Guzik <mjguzik@...il.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
Christian Brauner <brauner@...nel.org>, <linux-fsdevel@...r.kernel.org>,
<oliver.sang@...el.com>
Subject: Re: [linus:master] [fs] a914bd93f3:
stress-ng.close.close_calls_per_sec 52.2% regression
hi, Mateusz Guzik,
On Thu, Apr 17, 2025 at 12:17:54PM +0200, Mateusz Guzik wrote:
> On Thu, Apr 17, 2025 at 12:02:55PM +0200, Mateusz Guzik wrote:
> > bottom line though is there is a known tradeoff there and stress-ng
> > manufactures a case where it is always the wrong one.
> >
> > fd 2 at hand is inherited (it's the tty) and shared between *all*
> > workers on all CPUs.
> >
> > Ignoring some fluff, it's this in a loop:
> > dup2(2, 1024) = 1024
> > dup2(2, 1025) = 1025
> > dup2(2, 1026) = 1026
> > dup2(2, 1027) = 1027
> > dup2(2, 1028) = 1028
> > dup2(2, 1029) = 1029
> > dup2(2, 1030) = 1030
> > dup2(2, 1031) = 1031
> > [..]
> > close_range(1024, 1032, 0) = 0
> >
> > where fd 2 is the same file object in all 192 workers doing this.
> >
>
> the following will still have *some* impact, but the drop should be much
> lower
>
> it also has a side effect of further helping the single-threaded case by
> shortening the code when it works
we applied below patch upon a914bd93f3. it seems it not only recovers the
regression we saw on a914bd93f3, but also causes further performance benefit
that it's +29.4% better than 3e46a92a27 (parent of a914bd93f3).
at the same time, I also list stress-ng.close.ops_per_sec here which is not
in our original report since the data has overlap so our code logic don't
think they are reliable then will not list in table without some 'force'
option.
in a stress-ng close test, the output looks like below:
2025-04-18 02:58:28 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --close 192
stress-ng: info: [6268] setting to a 1 min run per stressor
stress-ng: info: [6268] dispatching hogs: 192 close
stress-ng: info: [6268] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics
stress-ng: metrc: [6268] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
stress-ng: metrc: [6268] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
stress-ng: metrc: [6268] close 1568702 60.08 171.29 9524.58 26108.29 161.79 84.05 1548 <--- (1)
stress-ng: metrc: [6268] miscellaneous metrics:
stress-ng: metrc: [6268] close 600923.80 close calls per sec (harmonic mean of 192 instances) <--- (2)
stress-ng: info: [6268] for a 60.14s run time:
stress-ng: info: [6268] 11547.73s available CPU time
stress-ng: info: [6268] 171.29s user time ( 1.48%)
stress-ng: info: [6268] 9525.12s system time ( 82.48%)
stress-ng: info: [6268] 9696.41s total time ( 83.97%)
stress-ng: info: [6268] load average: 520.00 149.63 51.46
stress-ng: info: [6268] skipped: 0
stress-ng: info: [6268] passed: 192: close (192)
stress-ng: info: [6268] failed: 0
stress-ng: info: [6268] metrics untrustworthy: 0
stress-ng: info: [6268] successful run completed in 1 min
the stress-ng.close.close_calls_per_sec data is from (2)
the stress-ng.close.ops_per_sec data is from line (1), bogo ops/s (real time)
from below, seems a914bd93f3 also has a small regression for
stress-ng.close.ops_per_sec but not obvious since data is not stable enough.
9f0124114f almost has same data as 3e46a92a27 regarding to this
stress-ng.close.ops_per_sec.
summary data:
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp1/close/stress-ng/60s
commit:
3e46a92a27 ("fs: use fput_close_sync() in close()")
a914bd93f3 ("fs: use fput_close() in filp_close()")
9f0124114f <--- a914bd93f3 + patch
3e46a92a27c2927f a914bd93f3edfedcdd59deb615e 9f0124114f707363af03caed5ae
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
473980 ± 14% -52.2% 226559 ± 13% +29.4% 613288 ± 12% stress-ng.close.close_calls_per_sec
1677393 ± 3% -6.0% 1576636 ± 5% +0.6% 1686892 ± 3% stress-ng.close.ops
27917 ± 3% -6.0% 26237 ± 5% +0.6% 28074 ± 3% stress-ng.close.ops_per_sec
full data is as [1]
>
> diff --git a/include/linux/file_ref.h b/include/linux/file_ref.h
> index 7db62fbc0500..c73865ed4251 100644
> --- a/include/linux/file_ref.h
> +++ b/include/linux/file_ref.h
> @@ -181,17 +181,15 @@ static __always_inline __must_check bool file_ref_put_close(file_ref_t *ref)
> long old, new;
>
> old = atomic_long_read(&ref->refcnt);
> - do {
> - if (unlikely(old < 0))
> - return __file_ref_put_badval(ref, old);
> -
> - if (old == FILE_REF_ONEREF)
> - new = FILE_REF_DEAD;
> - else
> - new = old - 1;
> - } while (!atomic_long_try_cmpxchg(&ref->refcnt, &old, new));
> -
> - return new == FILE_REF_DEAD;
> + if (likely(old == FILE_REF_ONEREF)) {
> + new = FILE_REF_DEAD;
> + if (likely(atomic_long_try_cmpxchg(&ref->refcnt, &old, new)))
> + return true;
> + /*
> + * The ref has changed from under us, don't play any games.
> + */
> + }
> + return file_ref_put(ref);
> }
>
> /**
[1]
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp1/close/stress-ng/60s
commit:
3e46a92a27 ("fs: use fput_close_sync() in close()")
a914bd93f3 ("fs: use fput_close() in filp_close()")
9f0124114f <--- a914bd93f3 + patch
3e46a92a27c2927f a914bd93f3edfedcdd59deb615e 9f0124114f707363af03caed5ae
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
355470 ± 14% -17.6% 292767 ± 7% -8.9% 323894 ± 6% cpuidle..usage
5.19 -0.6 4.61 ± 3% -0.0 5.18 ± 2% mpstat.cpu.all.usr%
495615 ± 7% -38.5% 304839 ± 8% -5.5% 468431 ± 7% vmstat.system.cs
780096 ± 5% -23.5% 596446 ± 3% -4.7% 743238 ± 5% vmstat.system.in
4004843 ± 8% -21.1% 3161813 ± 8% -6.2% 3758245 ± 10% time.involuntary_context_switches
9475 +1.1% 9582 +0.3% 9504 time.system_time
183.50 ± 2% -37.8% 114.20 ± 3% -3.4% 177.34 ± 2% time.user_time
25637892 ± 6% -42.6% 14725385 ± 9% -6.4% 23992138 ± 7% time.voluntary_context_switches
2512168 ± 17% -45.8% 1361659 ± 71% +0.5% 2524627 ± 15% sched_debug.cfs_rq:/.avg_vruntime.min
2512168 ± 17% -45.8% 1361659 ± 71% +0.5% 2524627 ± 15% sched_debug.cfs_rq:/.min_vruntime.min
700402 ± 2% +19.8% 838744 ± 10% +5.0% 735486 ± 4% sched_debug.cpu.avg_idle.avg
81230 ± 6% -59.6% 32788 ± 69% -6.1% 76301 ± 7% sched_debug.cpu.nr_switches.avg
27992 ± 20% -70.2% 8345 ± 74% +11.7% 31275 ± 26% sched_debug.cpu.nr_switches.min
473980 ± 14% -52.2% 226559 ± 13% +29.4% 613288 ± 12% stress-ng.close.close_calls_per_sec
1677393 ± 3% -6.0% 1576636 ± 5% +0.6% 1686892 ± 3% stress-ng.close.ops
27917 ± 3% -6.0% 26237 ± 5% +0.6% 28074 ± 3% stress-ng.close.ops_per_sec
4004843 ± 8% -21.1% 3161813 ± 8% -6.2% 3758245 ± 10% stress-ng.time.involuntary_context_switches
9475 +1.1% 9582 +0.3% 9504 stress-ng.time.system_time
183.50 ± 2% -37.8% 114.20 ± 3% -3.4% 177.34 ± 2% stress-ng.time.user_time
25637892 ± 6% -42.6% 14725385 ± 9% -6.4% 23992138 ± 7% stress-ng.time.voluntary_context_switches
23.01 ± 2% -1.4 21.61 ± 3% +0.1 23.13 ± 2% perf-stat.i.cache-miss-rate%
17981659 -10.8% 16035508 ± 4% -0.5% 17886941 ± 4% perf-stat.i.cache-misses
77288888 ± 2% -6.5% 72260357 ± 4% -1.7% 75978329 ± 3% perf-stat.i.cache-references
504949 ± 6% -38.1% 312536 ± 8% -5.7% 476406 ± 7% perf-stat.i.context-switches
33030 +15.7% 38205 ± 4% +1.3% 33444 ± 4% perf-stat.i.cycles-between-cache-misses
4.34 ± 10% -38.3% 2.68 ± 20% +0.1% 4.34 ± 11% perf-stat.i.metric.K/sec
26229 ± 44% +37.8% 36145 ± 4% +21.8% 31948 ± 4% perf-stat.overall.cycles-between-cache-misses
2.12 ± 47% +151.1% 5.32 ± 28% -13.5% 1.84 ± 27% perf-sched.sch_delay.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.28 ± 60% +990.5% 3.08 ± 53% +6339.3% 18.16 ±314% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.do_eventfd.__x64_sys_eventfd2.do_syscall_64
0.96 ± 28% +35.5% 1.30 ± 25% -7.5% 0.89 ± 29% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_noprof.load_elf_phdrs.load_elf_binary.exec_binprm
2.56 ± 43% +964.0% 27.27 ±161% +456.1% 14.25 ±127% perf-sched.sch_delay.avg.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
12.80 ±139% +366.1% 59.66 ±107% +242.3% 43.82 ±204% perf-sched.sch_delay.avg.ms.__cond_resched.dput.shmem_unlink.vfs_unlink.do_unlinkat
2.78 ± 25% +50.0% 4.17 ± 19% +22.2% 3.40 ± 18% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_cursor.dcache_dir_open
0.78 ± 18% +39.3% 1.09 ± 19% +24.7% 0.97 ± 36% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_shrink
2.63 ± 10% +16.7% 3.07 ± 4% +1.3% 2.67 ± 15% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
0.04 ±223% +3483.1% 1.38 ± 67% +2231.2% 0.90 ±243% perf-sched.sch_delay.avg.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
0.72 ± 17% +42.5% 1.02 ± 22% +20.4% 0.86 ± 15% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.35 ±134% +457.7% 1.94 ± 61% +177.7% 0.97 ±104% perf-sched.sch_delay.avg.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
1.88 ± 34% +574.9% 12.69 ±115% +389.0% 9.20 ±214% perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.41 ± 10% +18.0% 2.84 ± 9% +62.4% 3.91 ± 81% perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
1.85 ± 36% +155.6% 4.73 ± 50% +27.5% 2.36 ± 54% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
28.94 ± 26% -52.4% 13.78 ± 36% -29.1% 20.52 ± 50% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
2.24 ± 9% +22.1% 2.74 ± 8% +14.2% 2.56 ± 14% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_file_vma_batch_final
2.19 ± 7% +17.3% 2.57 ± 6% +6.4% 2.33 ± 6% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
2.39 ± 6% +16.4% 2.79 ± 9% +12.3% 2.69 ± 8% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_prepare
0.34 ± 77% +1931.5% 6.95 ± 99% +5218.5% 18.19 ±313% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.do_eventfd.__x64_sys_eventfd2.do_syscall_64
29.69 ± 28% +129.5% 68.12 ± 28% +11.1% 32.99 ± 30% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
7.71 ± 21% +10.4% 8.51 ± 25% -30.5% 5.36 ± 30% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.seq_read_iter.vfs_read.ksys_read
1.48 ± 96% +284.6% 5.68 ± 56% -39.2% 0.90 ±112% perf-sched.sch_delay.max.ms.__cond_resched.copy_strings_kernel.kernel_execve.call_usermodehelper_exec_async.ret_from_fork
3.59 ± 57% +124.5% 8.06 ± 45% +44.7% 5.19 ± 72% perf-sched.sch_delay.max.ms.__cond_resched.down_read.mmap_read_lock_maybe_expand.get_arg_page.copy_string_kernel
3.39 ± 91% +112.0% 7.19 ± 44% +133.6% 7.92 ±129% perf-sched.sch_delay.max.ms.__cond_resched.down_read_killable.iterate_dir.__x64_sys_getdents64.do_syscall_64
22.16 ± 77% +117.4% 48.17 ± 34% +41.9% 31.45 ± 54% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
8.72 ± 17% +358.5% 39.98 ± 61% +98.6% 17.32 ± 77% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
35.71 ± 49% +1.8% 36.35 ± 37% -48.6% 18.34 ± 25% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
0.04 ±223% +3484.4% 1.38 ± 67% +2231.2% 0.90 ±243% perf-sched.sch_delay.max.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
0.53 ±154% +676.1% 4.12 ± 61% +161.0% 1.39 ±109% perf-sched.sch_delay.max.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
25.76 ± 70% +9588.9% 2496 ±127% +2328.6% 625.69 ±272% perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
51.53 ± 26% -58.8% 21.22 ±106% +13.1% 58.30 ±190% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
4.97 ± 48% +154.8% 12.66 ± 75% +25.0% 6.21 ± 42% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
4.36 ± 48% +147.7% 10.81 ± 29% -14.1% 3.75 ± 26% perf-sched.wait_and_delay.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
108632 ± 4% +23.9% 134575 ± 6% -1.3% 107223 ± 4% perf-sched.wait_and_delay.count.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
572.67 ± 6% +37.3% 786.17 ± 7% +1.9% 583.75 ± 8% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
596.67 ± 12% +43.9% 858.50 ± 13% +2.4% 610.75 ± 8% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.wait_for_completion_state.call_usermodehelper_exec.__request_module
294.83 ± 9% +31.1% 386.50 ± 11% +1.7% 299.75 ± 11% perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
1223275 ± 2% -17.7% 1006293 ± 6% +0.5% 1228897 perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
2772 ± 11% +43.6% 3980 ± 11% +6.6% 2954 ± 9% perf-sched.wait_and_delay.count.do_wait.kernel_wait.call_usermodehelper_exec_work.process_one_work
11690 ± 7% +29.8% 15173 ± 10% +5.6% 12344 ± 7% perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
4072 ±100% +163.7% 10737 ± 7% +59.4% 6491 ± 58% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
8811 ± 6% +26.2% 11117 ± 9% +2.6% 9039 ± 8% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
662.17 ± 29% +187.8% 1905 ± 29% +36.2% 901.58 ± 32% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
15.50 ± 11% +48.4% 23.00 ± 16% -6.5% 14.50 ± 24% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
167.67 ± 20% +48.0% 248.17 ± 26% -8.3% 153.83 ± 27% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
2680 ± 12% +42.5% 3820 ± 11% +6.7% 2860 ± 9% perf-sched.wait_and_delay.count.schedule_timeout.___down_common.__down_timeout.down_timeout
137.17 ± 13% +31.8% 180.83 ± 9% +12.4% 154.17 ± 16% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
2636 ± 12% +43.1% 3772 ± 12% +6.1% 2797 ± 10% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.call_usermodehelper_exec
10.50 ± 11% +74.6% 18.33 ± 23% +19.0% 12.50 ± 18% perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
5619 ± 5% +38.8% 7797 ± 9% +9.2% 6136 ± 8% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
70455 ± 4% +32.3% 93197 ± 6% +1.1% 71250 ± 3% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
6990 ± 4% +37.4% 9603 ± 9% +7.7% 7528 ± 5% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
191.28 ±124% +1455.2% 2974 ±100% +428.0% 1009 ±166% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.path_openat.do_filp_open.do_sys_openat2
1758 ±118% -87.9% 211.89 ±211% -96.6% 59.86 ±184% perf-sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
2.24 ± 48% +144.6% 5.48 ± 29% -14.7% 1.91 ± 25% perf-sched.wait_time.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
8.06 ±101% +618.7% 57.94 ± 68% +332.7% 34.88 ±100% perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.28 ±146% +370.9% 1.30 ± 64% +260.2% 0.99 ±148% perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vms_gather_munmap_vmas.do_vmi_align_munmap
3.86 ± 5% +86.0% 7.18 ± 44% +41.7% 5.47 ± 32% perf-sched.wait_time.avg.ms.__cond_resched.down_write.free_pgtables.exit_mmap.__mmput
2.54 ± 29% +51.6% 3.85 ± 12% +9.6% 2.78 ± 39% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.11 ± 39% +201.6% 3.36 ± 47% +77.3% 1.98 ± 86% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
3.60 ± 68% +1630.2% 62.29 ±153% +398.4% 17.94 ±134% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
0.20 ± 64% +115.4% 0.42 ± 42% +145.2% 0.48 ± 83% perf-sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.load_elf_binary.exec_binprm
142.09 ±145% +58.3% 224.93 ±104% +179.8% 397.63 ±102% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_pid.copy_process.kernel_clone
55.51 ± 53% +218.0% 176.54 ± 83% +96.2% 108.89 ± 87% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
3.22 ± 3% +15.4% 3.72 ± 9% +2.3% 3.30 ± 7% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
0.04 ±223% +37562.8% 14.50 ±198% +2231.2% 0.90 ±243% perf-sched.wait_time.avg.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
1.19 ± 30% +59.3% 1.90 ± 34% +3.7% 1.24 ± 37% perf-sched.wait_time.avg.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
0.36 ±133% +447.8% 1.97 ± 60% +174.5% 0.99 ±103% perf-sched.wait_time.avg.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
1.73 ± 47% +165.0% 4.58 ± 51% +45.7% 2.52 ± 52% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
1.10 ± 21% +693.9% 8.70 ± 70% +459.3% 6.13 ±173% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
54.56 ± 32% +400.2% 272.90 ± 67% +345.0% 242.76 ± 41% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
10.87 ± 18% +151.7% 27.36 ± 68% +69.1% 18.38 ± 83% perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_pte_missing.__handle_mm_fault
123.35 ±181% +1219.3% 1627 ± 82% +469.3% 702.17 ±118% perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.68 ±108% +5490.2% 541.19 ±185% -56.7% 4.19 ±327% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.do_epoll_create.__x64_sys_epoll_create.do_syscall_64
3.39 ± 91% +112.0% 7.19 ± 44% +52.9% 5.19 ± 54% perf-sched.wait_time.max.ms.__cond_resched.down_read_killable.iterate_dir.__x64_sys_getdents64.do_syscall_64
1.12 ±128% +407.4% 5.67 ± 52% +536.0% 7.11 ±144% perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vms_gather_munmap_vmas.do_vmi_align_munmap
30.58 ± 29% +1741.1% 563.04 ± 80% +1126.4% 375.06 ±119% perf-sched.wait_time.max.ms.__cond_resched.down_write.free_pgtables.exit_mmap.__mmput
3.82 ±114% +232.1% 12.70 ± 49% +155.2% 9.76 ±120% perf-sched.wait_time.max.ms.__cond_resched.down_write.vms_gather_munmap_vmas.__mmap_prepare.__mmap_region
7.75 ± 46% +72.4% 13.36 ± 32% +18.4% 9.17 ± 86% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.39 ± 48% +259.7% 48.17 ± 34% +107.1% 27.74 ± 58% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
12.46 ± 30% +90.5% 23.73 ± 34% +25.5% 15.64 ± 44% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
479.90 ± 78% +496.9% 2864 ±141% +310.9% 1972 ±132% perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
185.75 ±116% +875.0% 1811 ± 84% +264.8% 677.56 ±141% perf-sched.wait_time.max.ms.__cond_resched.dput.path_openat.do_filp_open.do_sys_openat2
2.52 ± 44% +105.2% 5.18 ± 36% +38.3% 3.49 ± 74% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.alloc_bprm.kernel_execve
0.04 ±223% +37564.1% 14.50 ±198% +2231.2% 0.90 ±243% perf-sched.wait_time.max.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
0.54 ±153% +669.7% 4.15 ± 61% +161.6% 1.41 ±108% perf-sched.wait_time.max.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
28.22 ± 14% +41.2% 39.84 ± 25% +27.1% 35.87 ± 43% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
4.90 ± 51% +158.2% 12.66 ± 75% +26.7% 6.21 ± 42% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
26.54 ± 66% +5220.1% 1411 ± 84% +2691.7% 740.84 ±187% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
1262 ± 9% +434.3% 6744 ± 60% +293.8% 4971 ± 41% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
30.09 -12.8 17.32 -2.9 27.23 perf-profile.calltrace.cycles-pp.filp_flush.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.32 -9.3 0.00 -9.3 0.00 perf-profile.calltrace.cycles-pp.fput.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
41.15 -5.9 35.26 -0.2 40.94 perf-profile.calltrace.cycles-pp.__dup2
41.02 -5.9 35.17 -0.2 40.81 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
41.03 -5.8 35.18 -0.2 40.83 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__dup2
13.86 -5.5 8.35 -1.1 12.79 perf-profile.calltrace.cycles-pp.filp_flush.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64
40.21 -5.5 34.71 -0.2 40.02 perf-profile.calltrace.cycles-pp.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
38.01 -4.9 33.10 +0.1 38.09 perf-profile.calltrace.cycles-pp.do_dup2.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
4.90 ± 2% -1.9 2.96 ± 3% -0.5 4.38 ± 2% perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
2.63 ± 9% -1.9 0.76 ± 18% -0.5 2.11 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
4.44 -1.7 2.76 ± 2% -0.4 3.99 perf-profile.calltrace.cycles-pp.dnotify_flush.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
2.50 ± 2% -0.9 1.56 ± 3% -0.2 2.28 ± 2% perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_flush.filp_close.do_dup2.__x64_sys_dup2
2.22 -0.8 1.42 ± 2% -0.2 2.04 perf-profile.calltrace.cycles-pp.dnotify_flush.filp_flush.filp_close.do_dup2.__x64_sys_dup2
1.66 ± 5% -0.3 1.37 ± 5% -0.2 1.42 ± 4% perf-profile.calltrace.cycles-pp.ksys_dup3.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
1.60 ± 5% -0.3 1.32 ± 5% -0.2 1.36 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock.ksys_dup3.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.0 0.00 +0.6 0.65 ± 5% perf-profile.calltrace.cycles-pp._raw_spin_lock.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
0.00 +0.0 0.00 +41.2 41.20 perf-profile.calltrace.cycles-pp.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
0.00 +0.0 0.00 +42.0 42.01 perf-profile.calltrace.cycles-pp.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
0.00 +0.0 0.00 +42.4 42.36 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.close_range
0.00 +0.0 0.00 +42.4 42.36 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.close_range
0.00 +0.0 0.00 +42.4 42.44 perf-profile.calltrace.cycles-pp.close_range
0.58 ± 2% +0.0 0.62 ± 2% +0.0 0.61 ± 3% perf-profile.calltrace.cycles-pp.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
1.40 ± 4% +0.0 1.44 ± 9% -0.4 0.99 ± 6% perf-profile.calltrace.cycles-pp.__close
0.54 ± 2% +0.0 0.58 ± 2% +0.0 0.56 ± 3% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault
1.34 ± 5% +0.0 1.40 ± 9% -0.4 0.93 ± 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
1.35 ± 5% +0.1 1.40 ± 9% -0.4 0.93 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
0.72 ± 4% +0.1 0.79 ± 4% -0.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
1.23 ± 5% +0.1 1.31 ± 10% -0.5 0.77 ± 8% perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
0.00 +1.8 1.79 ± 5% +1.4 1.35 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.fput_close.filp_close.__do_sys_close_range.do_syscall_64
19.11 +4.4 23.49 +0.7 19.76 perf-profile.calltrace.cycles-pp.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe
40.94 +6.5 47.46 -40.9 0.00 perf-profile.calltrace.cycles-pp.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
42.24 +6.6 48.79 -42.2 0.00 perf-profile.calltrace.cycles-pp.syscall
42.14 +6.6 48.72 -42.1 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
42.14 +6.6 48.72 -42.1 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
41.86 +6.6 48.46 -41.9 0.00 perf-profile.calltrace.cycles-pp.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
0.00 +14.6 14.60 ± 2% +6.4 6.41 perf-profile.calltrace.cycles-pp.fput_close.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64
0.00 +29.0 28.95 ± 2% +12.5 12.50 perf-profile.calltrace.cycles-pp.fput_close.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
45.76 -19.0 26.72 -4.1 41.64 perf-profile.children.cycles-pp.filp_flush
14.67 -14.7 0.00 -14.7 0.00 perf-profile.children.cycles-pp.fput
41.20 -5.9 35.30 -0.2 40.99 perf-profile.children.cycles-pp.__dup2
40.21 -5.5 34.71 -0.2 40.03 perf-profile.children.cycles-pp.__x64_sys_dup2
38.53 -5.2 33.33 +0.1 38.59 perf-profile.children.cycles-pp.do_dup2
7.81 ± 2% -3.1 4.74 ± 3% -0.8 7.02 ± 2% perf-profile.children.cycles-pp.locks_remove_posix
7.03 -2.6 4.40 ± 2% -0.7 6.37 perf-profile.children.cycles-pp.dnotify_flush
5.60 ± 16% -1.3 4.27 ± 14% -0.7 4.89 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
1.24 ± 3% -0.4 0.86 ± 4% -0.0 1.21 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.67 ± 5% -0.3 1.37 ± 5% -0.2 1.42 ± 4% perf-profile.children.cycles-pp.ksys_dup3
3.09 ± 3% -0.1 2.95 ± 4% -0.3 2.82 ± 3% perf-profile.children.cycles-pp._raw_spin_lock
0.29 -0.1 0.22 ± 2% +0.0 0.30 ± 2% perf-profile.children.cycles-pp.update_load_avg
0.10 ± 71% -0.1 0.04 ±112% -0.1 0.05 ± 30% perf-profile.children.cycles-pp.hrtimer_update_next_event
0.10 ± 10% -0.1 0.05 ± 46% +0.0 0.13 ± 7% perf-profile.children.cycles-pp.__x64_sys_fcntl
0.17 ± 7% -0.1 0.12 ± 4% -0.0 0.14 ± 5% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.15 ± 3% -0.0 0.10 ± 18% -0.0 0.15 ± 3% perf-profile.children.cycles-pp.clockevents_program_event
0.04 ± 45% -0.0 0.00 +0.0 0.07 ± 7% perf-profile.children.cycles-pp.update_irq_load_avg
0.06 ± 11% -0.0 0.02 ± 99% +0.0 0.08 ± 24% perf-profile.children.cycles-pp.stress_close_func
0.13 ± 5% -0.0 0.10 ± 7% +0.0 0.14 ± 9% perf-profile.children.cycles-pp.__switch_to
0.06 ± 7% -0.0 0.04 ± 71% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.lapic_next_deadline
0.11 ± 3% -0.0 0.09 ± 5% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
1.49 ± 3% -0.0 1.48 ± 2% +0.2 1.68 ± 4% perf-profile.children.cycles-pp.x64_sys_call
0.00 +0.0 0.00 +42.5 42.47 perf-profile.children.cycles-pp.close_range
0.23 ± 2% +0.0 0.24 ± 5% +0.0 0.26 ± 5% perf-profile.children.cycles-pp.__irq_exit_rcu
0.06 ± 6% +0.0 0.08 ± 6% +0.0 0.07 ± 9% perf-profile.children.cycles-pp.__folio_batch_add_and_move
0.26 ± 2% +0.0 0.28 ± 3% +0.0 0.28 ± 6% perf-profile.children.cycles-pp.folio_remove_rmap_ptes
0.08 ± 4% +0.0 0.11 ± 10% +0.0 0.09 ± 7% perf-profile.children.cycles-pp.set_pte_range
0.45 ± 2% +0.0 0.48 ± 2% +0.0 0.48 ± 4% perf-profile.children.cycles-pp.zap_present_ptes
0.18 ± 3% +0.0 0.21 ± 4% +0.0 0.20 ± 4% perf-profile.children.cycles-pp.folios_put_refs
0.29 ± 3% +0.0 0.32 ± 3% +0.0 0.31 ± 4% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
0.29 ± 3% +0.0 0.32 ± 3% +0.0 0.31 ± 5% perf-profile.children.cycles-pp.free_pages_and_swap_cache
1.42 ± 5% +0.0 1.45 ± 9% -0.4 1.00 ± 6% perf-profile.children.cycles-pp.__close
0.34 ± 4% +0.0 0.38 ± 3% +0.0 0.36 ± 4% perf-profile.children.cycles-pp.tlb_finish_mmu
0.24 ± 3% +0.0 0.27 ± 5% -0.0 0.23 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.69 ± 2% +0.0 0.74 ± 2% +0.0 0.72 ± 5% perf-profile.children.cycles-pp.filemap_map_pages
0.73 ± 2% +0.0 0.78 ± 2% +0.0 0.77 ± 5% perf-profile.children.cycles-pp.do_read_fault
0.65 ± 7% +0.1 0.70 ± 15% -0.5 0.17 ± 10% perf-profile.children.cycles-pp.fput_close_sync
1.24 ± 5% +0.1 1.33 ± 10% -0.5 0.79 ± 8% perf-profile.children.cycles-pp.__x64_sys_close
42.26 +6.5 48.80 -42.3 0.00 perf-profile.children.cycles-pp.syscall
41.86 +6.6 48.46 +0.2 42.02 perf-profile.children.cycles-pp.__do_sys_close_range
60.05 +10.9 70.95 +0.9 60.97 perf-profile.children.cycles-pp.filp_close
0.00 +44.5 44.55 ± 2% +19.7 19.70 perf-profile.children.cycles-pp.fput_close
14.31 ± 2% -14.3 0.00 -14.3 0.00 perf-profile.self.cycles-pp.fput
30.12 ± 2% -13.0 17.09 -2.4 27.73 perf-profile.self.cycles-pp.filp_flush
19.17 ± 2% -9.5 9.68 ± 3% -0.6 18.61 perf-profile.self.cycles-pp.do_dup2
7.63 ± 3% -3.0 4.62 ± 4% -0.7 6.90 ± 2% perf-profile.self.cycles-pp.locks_remove_posix
6.86 ± 3% -2.6 4.28 ± 3% -0.6 6.25 perf-profile.self.cycles-pp.dnotify_flush
0.64 ± 4% -0.4 0.26 ± 4% -0.0 0.62 ± 3% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.22 ± 10% -0.1 0.15 ± 11% +0.1 0.34 ± 2% perf-profile.self.cycles-pp.x64_sys_call
0.23 ± 3% -0.1 0.17 ± 8% +0.0 0.25 ± 3% perf-profile.self.cycles-pp.__schedule
0.10 ± 44% -0.1 0.05 ± 71% +0.0 0.15 ± 3% perf-profile.self.cycles-pp.pick_next_task_fair
0.08 ± 19% -0.1 0.03 ±100% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.pick_eevdf
0.04 ± 44% -0.0 0.00 +0.0 0.07 ± 8% perf-profile.self.cycles-pp.__x64_sys_fcntl
0.06 ± 7% -0.0 0.03 ±100% -0.0 0.06 ± 8% perf-profile.self.cycles-pp.lapic_next_deadline
0.13 ± 7% -0.0 0.10 ± 10% +0.0 0.13 ± 8% perf-profile.self.cycles-pp.__switch_to
0.09 ± 8% -0.0 0.06 ± 8% +0.0 0.10 perf-profile.self.cycles-pp.__update_load_avg_se
0.08 ± 4% -0.0 0.05 ± 7% -0.0 0.07 perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
0.08 ± 9% -0.0 0.06 ± 11% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.06 ± 7% +0.0 0.08 ± 8% +0.0 0.07 ± 8% perf-profile.self.cycles-pp.filemap_map_pages
0.12 ± 3% +0.0 0.14 ± 4% +0.0 0.13 ± 6% perf-profile.self.cycles-pp.folios_put_refs
0.25 ± 2% +0.0 0.27 ± 3% +0.0 0.27 ± 6% perf-profile.self.cycles-pp.folio_remove_rmap_ptes
0.17 ± 5% +0.0 0.20 ± 6% -0.0 0.17 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.64 ± 7% +0.1 0.70 ± 15% -0.5 0.17 ± 11% perf-profile.self.cycles-pp.fput_close_sync
0.00 +0.1 0.07 ± 5% +0.0 0.00 perf-profile.self.cycles-pp.do_nanosleep
0.00 +0.1 0.10 ± 15% +0.0 0.00 perf-profile.self.cycles-pp.filp_close
0.00 +43.9 43.86 ± 2% +19.4 19.35 perf-profile.self.cycles-pp.fput_close
Powered by blists - more mailing lists