[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHGcQspttcaZ6nYfwBkXiJEC-XKuprxnmXRjjufz2vPRhw@mail.gmail.com>
Date: Thu, 17 Apr 2025 11:54:45 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
Christian Brauner <brauner@...nel.org>, linux-fsdevel@...r.kernel.org
Subject: Re: [linus:master] [fs] a914bd93f3: stress-ng.close.close_calls_per_sec
52.2% regression
On Thu, Apr 17, 2025 at 10:24 AM kernel test robot
<oliver.sang@...el.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a 52.2% regression of stress-ng.close.close_calls_per_sec on:
>
>
> commit: a914bd93f3edfedcdd59deb615e8dd1b3643cac5 ("fs: use fput_close() in filp_close()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [still regression on linus/master 7cdabafc001202de9984f22c973305f424e0a8b7]
> [still regression on linux-next/master 01c6df60d5d4ae00cd5c1648818744838bba7763]
>
> testcase: stress-ng
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
> parameters:
>
> nr_threads: 100%
> testtime: 60s
> test: close
> cpufreq_governor: performance
>
>
> the data is not very stable, but the regression trend seems clear.
>
> a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json: "stress-ng.close.close_calls_per_sec": [
> a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 193538.65,
> a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 232133.85,
> a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 276146.66,
> a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 193345.38,
> a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 209411.34,
> a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 254782.41
> a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- ],
>
> 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json: "stress-ng.close.close_calls_per_sec": [
> 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 427893.13,
> 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 456267.6,
> 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 509121.02,
> 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 544289.08,
> 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 354004.06,
> 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 552310.73
> 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- ],
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202504171513.6d6f8a16-lkp@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250417/202504171513.6d6f8a16-lkp@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp1/close/stress-ng/60s
>
> commit:
> 3e46a92a27 ("fs: use fput_close_sync() in close()")
> a914bd93f3 ("fs: use fput_close() in filp_close()")
>
I'm going to have to chew on it.
First, the commit at hand states:
fs: use fput_close() in filp_close()
When tracing a kernel build over refcounts seen this is a wash:
@[kprobe:filp_close]:
[0] 32195 |@@@@@@@@@@
|
[1] 164567
|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
I verified vast majority of the skew comes from do_close_on_exec() which
could be changed to use a different variant instead.
So if need be this is trivially revertable without a major loss anywhere.
But also:
Even without changing that, the 19.5% of calls which got here still can
save the extra atomic. Calls here are borderline non-existent compared
to fput (over 3.2 mln!), so they should not negatively affect
scalability.
As the code converts a lock xadd into a lock cmpxchg loop in exchange
for saving an atomic if this is the last close. This indeed scales
worse if this in fact does not close the file.
stress-ng dups fd 2 several times and then closes it in close_range()
in a loop. this manufactures a case where this is never the last close
and the cmpxchg loop is a plain loss.
While I don't believe this represents real-world behavior, maybe
something can be done. I note again this patch can be whacked
altogether without any tears shed.
So I'm going to chew on it.
> 3e46a92a27c2927f a914bd93f3edfedcdd59deb615e
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 355470 ą 14% -17.6% 292767 ą 7% cpuidle..usage
> 5.19 -0.6 4.61 ą 3% mpstat.cpu.all.usr%
> 495615 ą 7% -38.5% 304839 ą 8% vmstat.system.cs
> 780096 ą 5% -23.5% 596446 ą 3% vmstat.system.in
> 2512168 ą 17% -45.8% 1361659 ą 71% sched_debug.cfs_rq:/.avg_vruntime.min
> 2512168 ą 17% -45.8% 1361659 ą 71% sched_debug.cfs_rq:/.min_vruntime.min
> 700402 ą 2% +19.8% 838744 ą 10% sched_debug.cpu.avg_idle.avg
> 81230 ą 6% -59.6% 32788 ą 69% sched_debug.cpu.nr_switches.avg
> 27992 ą 20% -70.2% 8345 ą 74% sched_debug.cpu.nr_switches.min
> 473980 ą 14% -52.2% 226559 ą 13% stress-ng.close.close_calls_per_sec
> 4004843 ą 8% -21.1% 3161813 ą 8% stress-ng.time.involuntary_context_switches
> 9475 +1.1% 9582 stress-ng.time.system_time
> 183.50 ą 2% -37.8% 114.20 ą 3% stress-ng.time.user_time
> 25637892 ą 6% -42.6% 14725385 ą 9% stress-ng.time.voluntary_context_switches
> 23.01 ą 2% -1.4 21.61 ą 3% perf-stat.i.cache-miss-rate%
> 17981659 -10.8% 16035508 ą 4% perf-stat.i.cache-misses
> 77288888 ą 2% -6.5% 72260357 ą 4% perf-stat.i.cache-references
> 504949 ą 6% -38.1% 312536 ą 8% perf-stat.i.context-switches
> 33030 +15.7% 38205 ą 4% perf-stat.i.cycles-between-cache-misses
> 4.34 ą 10% -38.3% 2.68 ą 20% perf-stat.i.metric.K/sec
> 26229 ą 44% +37.8% 36145 ą 4% perf-stat.overall.cycles-between-cache-misses
> 30.09 -12.8 17.32 perf-profile.calltrace.cycles-pp.filp_flush.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 9.32 -9.3 0.00 perf-profile.calltrace.cycles-pp.fput.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 41.15 -5.9 35.26 perf-profile.calltrace.cycles-pp.__dup2
> 41.02 -5.9 35.17 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> 41.03 -5.8 35.18 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__dup2
> 13.86 -5.5 8.35 perf-profile.calltrace.cycles-pp.filp_flush.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64
> 40.21 -5.5 34.71 perf-profile.calltrace.cycles-pp.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> 38.01 -4.9 33.10 perf-profile.calltrace.cycles-pp.do_dup2.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> 4.90 ą 2% -1.9 2.96 ą 3% perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
> 2.63 ą 9% -1.9 0.76 ą 18% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
> 4.44 -1.7 2.76 ą 2% perf-profile.calltrace.cycles-pp.dnotify_flush.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
> 2.50 ą 2% -0.9 1.56 ą 3% perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_flush.filp_close.do_dup2.__x64_sys_dup2
> 2.22 -0.8 1.42 ą 2% perf-profile.calltrace.cycles-pp.dnotify_flush.filp_flush.filp_close.do_dup2.__x64_sys_dup2
> 1.66 ą 5% -0.3 1.37 ą 5% perf-profile.calltrace.cycles-pp.ksys_dup3.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> 1.60 ą 5% -0.3 1.32 ą 5% perf-profile.calltrace.cycles-pp._raw_spin_lock.ksys_dup3.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.58 ą 2% +0.0 0.62 ą 2% perf-profile.calltrace.cycles-pp.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> 0.54 ą 2% +0.0 0.58 ą 2% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault
> 0.72 ą 4% +0.1 0.79 ą 4% perf-profile.calltrace.cycles-pp._raw_spin_lock.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> 0.00 +1.8 1.79 ą 5% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.fput_close.filp_close.__do_sys_close_range.do_syscall_64
> 19.11 +4.4 23.49 perf-profile.calltrace.cycles-pp.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 40.94 +6.5 47.46 perf-profile.calltrace.cycles-pp.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> 42.24 +6.6 48.79 perf-profile.calltrace.cycles-pp.syscall
> 42.14 +6.6 48.72 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
> 42.14 +6.6 48.72 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> 41.86 +6.6 48.46 perf-profile.calltrace.cycles-pp.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> 0.00 +14.6 14.60 ą 2% perf-profile.calltrace.cycles-pp.fput_close.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64
> 0.00 +29.0 28.95 ą 2% perf-profile.calltrace.cycles-pp.fput_close.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 45.76 -19.0 26.72 perf-profile.children.cycles-pp.filp_flush
> 14.67 -14.7 0.00 perf-profile.children.cycles-pp.fput
> 41.20 -5.9 35.30 perf-profile.children.cycles-pp.__dup2
> 40.21 -5.5 34.71 perf-profile.children.cycles-pp.__x64_sys_dup2
> 38.53 -5.2 33.33 perf-profile.children.cycles-pp.do_dup2
> 7.81 ą 2% -3.1 4.74 ą 3% perf-profile.children.cycles-pp.locks_remove_posix
> 7.03 -2.6 4.40 ą 2% perf-profile.children.cycles-pp.dnotify_flush
> 1.24 ą 3% -0.4 0.86 ą 4% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 1.67 ą 5% -0.3 1.37 ą 5% perf-profile.children.cycles-pp.ksys_dup3
> 0.29 -0.1 0.22 ą 2% perf-profile.children.cycles-pp.update_load_avg
> 0.10 ą 10% -0.1 0.05 ą 46% perf-profile.children.cycles-pp.__x64_sys_fcntl
> 0.17 ą 7% -0.1 0.12 ą 4% perf-profile.children.cycles-pp.entry_SYSCALL_64
> 0.15 ą 3% -0.0 0.10 ą 18% perf-profile.children.cycles-pp.clockevents_program_event
> 0.06 ą 11% -0.0 0.02 ą 99% perf-profile.children.cycles-pp.stress_close_func
> 0.13 ą 5% -0.0 0.10 ą 7% perf-profile.children.cycles-pp.__switch_to
> 0.06 ą 7% -0.0 0.04 ą 71% perf-profile.children.cycles-pp.lapic_next_deadline
> 0.11 ą 3% -0.0 0.09 ą 5% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> 0.06 ą 6% +0.0 0.08 ą 6% perf-profile.children.cycles-pp.__folio_batch_add_and_move
> 0.26 ą 2% +0.0 0.28 ą 3% perf-profile.children.cycles-pp.folio_remove_rmap_ptes
> 0.08 ą 4% +0.0 0.11 ą 10% perf-profile.children.cycles-pp.set_pte_range
> 0.45 ą 2% +0.0 0.48 ą 2% perf-profile.children.cycles-pp.zap_present_ptes
> 0.18 ą 3% +0.0 0.21 ą 4% perf-profile.children.cycles-pp.folios_put_refs
> 0.29 ą 3% +0.0 0.32 ą 3% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
> 0.29 ą 3% +0.0 0.32 ą 3% perf-profile.children.cycles-pp.free_pages_and_swap_cache
> 0.34 ą 4% +0.0 0.38 ą 3% perf-profile.children.cycles-pp.tlb_finish_mmu
> 0.24 ą 3% +0.0 0.27 ą 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 0.73 ą 2% +0.0 0.78 ą 2% perf-profile.children.cycles-pp.do_read_fault
> 0.69 ą 2% +0.0 0.74 ą 2% perf-profile.children.cycles-pp.filemap_map_pages
> 42.26 +6.5 48.80 perf-profile.children.cycles-pp.syscall
> 41.86 +6.6 48.46 perf-profile.children.cycles-pp.__do_sys_close_range
> 60.05 +10.9 70.95 perf-profile.children.cycles-pp.filp_close
> 0.00 +44.5 44.55 ą 2% perf-profile.children.cycles-pp.fput_close
> 14.31 ą 2% -14.3 0.00 perf-profile.self.cycles-pp.fput
> 30.12 ą 2% -13.0 17.09 perf-profile.self.cycles-pp.filp_flush
> 19.17 ą 2% -9.5 9.68 ą 3% perf-profile.self.cycles-pp.do_dup2
> 7.63 ą 3% -3.0 4.62 ą 4% perf-profile.self.cycles-pp.locks_remove_posix
> 6.86 ą 3% -2.6 4.28 ą 3% perf-profile.self.cycles-pp.dnotify_flush
> 0.64 ą 4% -0.4 0.26 ą 4% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> 0.22 ą 10% -0.1 0.15 ą 11% perf-profile.self.cycles-pp.x64_sys_call
> 0.23 ą 3% -0.1 0.17 ą 8% perf-profile.self.cycles-pp.__schedule
> 0.08 ą 19% -0.1 0.03 ą100% perf-profile.self.cycles-pp.pick_eevdf
> 0.06 ą 7% -0.0 0.03 ą100% perf-profile.self.cycles-pp.lapic_next_deadline
> 0.13 ą 7% -0.0 0.10 ą 10% perf-profile.self.cycles-pp.__switch_to
> 0.09 ą 8% -0.0 0.06 ą 8% perf-profile.self.cycles-pp.__update_load_avg_se
> 0.08 ą 4% -0.0 0.05 ą 7% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
> 0.08 ą 9% -0.0 0.06 ą 11% perf-profile.self.cycles-pp.entry_SYSCALL_64
> 0.06 ą 7% +0.0 0.08 ą 8% perf-profile.self.cycles-pp.filemap_map_pages
> 0.12 ą 3% +0.0 0.14 ą 4% perf-profile.self.cycles-pp.folios_put_refs
> 0.25 ą 2% +0.0 0.27 ą 3% perf-profile.self.cycles-pp.folio_remove_rmap_ptes
> 0.17 ą 5% +0.0 0.20 ą 6% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> 0.00 +0.1 0.07 ą 5% perf-profile.self.cycles-pp.do_nanosleep
> 0.00 +0.1 0.10 ą 15% perf-profile.self.cycles-pp.filp_close
> 0.00 +43.9 43.86 ą 2% perf-profile.self.cycles-pp.fput_close
> 2.12 ą 47% +151.1% 5.32 ą 28% perf-sched.sch_delay.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 0.28 ą 60% +990.5% 3.08 ą 53% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.do_eventfd.__x64_sys_eventfd2.do_syscall_64
> 0.96 ą 28% +35.5% 1.30 ą 25% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_noprof.load_elf_phdrs.load_elf_binary.exec_binprm
> 2.56 ą 43% +964.0% 27.27 ą161% perf-sched.sch_delay.avg.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
> 12.80 ą139% +366.1% 59.66 ą107% perf-sched.sch_delay.avg.ms.__cond_resched.dput.shmem_unlink.vfs_unlink.do_unlinkat
> 2.78 ą 25% +50.0% 4.17 ą 19% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_cursor.dcache_dir_open
> 0.78 ą 18% +39.3% 1.09 ą 19% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_shrink
> 2.63 ą 10% +16.7% 3.07 ą 4% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
> 0.04 ą223% +3483.1% 1.38 ą 67% perf-sched.sch_delay.avg.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> 0.72 ą 17% +42.5% 1.02 ą 22% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
> 0.35 ą134% +457.7% 1.94 ą 61% perf-sched.sch_delay.avg.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> 1.88 ą 34% +574.9% 12.69 ą115% perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.41 ą 10% +18.0% 2.84 ą 9% perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> 1.85 ą 36% +155.6% 4.73 ą 50% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> 28.94 ą 26% -52.4% 13.78 ą 36% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
> 2.24 ą 9% +22.1% 2.74 ą 8% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_file_vma_batch_final
> 2.19 ą 7% +17.3% 2.57 ą 6% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
> 2.39 ą 6% +16.4% 2.79 ą 9% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_prepare
> 0.34 ą 77% +1931.5% 6.95 ą 99% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.do_eventfd.__x64_sys_eventfd2.do_syscall_64
> 29.69 ą 28% +129.5% 68.12 ą 28% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
> 1.48 ą 96% +284.6% 5.68 ą 56% perf-sched.sch_delay.max.ms.__cond_resched.copy_strings_kernel.kernel_execve.call_usermodehelper_exec_async.ret_from_fork
> 3.59 ą 57% +124.5% 8.06 ą 45% perf-sched.sch_delay.max.ms.__cond_resched.down_read.mmap_read_lock_maybe_expand.get_arg_page.copy_string_kernel
> 3.39 ą 91% +112.0% 7.19 ą 44% perf-sched.sch_delay.max.ms.__cond_resched.down_read_killable.iterate_dir.__x64_sys_getdents64.do_syscall_64
> 22.16 ą 77% +117.4% 48.17 ą 34% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
> 8.72 ą 17% +358.5% 39.98 ą 61% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
> 0.04 ą223% +3484.4% 1.38 ą 67% perf-sched.sch_delay.max.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> 0.53 ą154% +676.1% 4.12 ą 61% perf-sched.sch_delay.max.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> 25.76 ą 70% +9588.9% 2496 ą127% perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 51.53 ą 26% -58.8% 21.22 ą106% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> 4.97 ą 48% +154.8% 12.66 ą 75% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> 4.36 ą 48% +147.7% 10.81 ą 29% perf-sched.wait_and_delay.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 108632 ą 4% +23.9% 134575 ą 6% perf-sched.wait_and_delay.count.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 572.67 ą 6% +37.3% 786.17 ą 7% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> 596.67 ą 12% +43.9% 858.50 ą 13% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.wait_for_completion_state.call_usermodehelper_exec.__request_module
> 294.83 ą 9% +31.1% 386.50 ą 11% perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
> 1223275 ą 2% -17.7% 1006293 ą 6% perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
> 2772 ą 11% +43.6% 3980 ą 11% perf-sched.wait_and_delay.count.do_wait.kernel_wait.call_usermodehelper_exec_work.process_one_work
> 11690 ą 7% +29.8% 15173 ą 10% perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
> 4072 ą100% +163.7% 10737 ą 7% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
> 8811 ą 6% +26.2% 11117 ą 9% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> 662.17 ą 29% +187.8% 1905 ą 29% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
> 15.50 ą 11% +48.4% 23.00 ą 16% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
> 167.67 ą 20% +48.0% 248.17 ą 26% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
> 2680 ą 12% +42.5% 3820 ą 11% perf-sched.wait_and_delay.count.schedule_timeout.___down_common.__down_timeout.down_timeout
> 137.17 ą 13% +31.8% 180.83 ą 9% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
> 2636 ą 12% +43.1% 3772 ą 12% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.call_usermodehelper_exec
> 10.50 ą 11% +74.6% 18.33 ą 23% perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
> 5619 ą 5% +38.8% 7797 ą 9% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 70455 ą 4% +32.3% 93197 ą 6% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 6990 ą 4% +37.4% 9603 ą 9% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 191.28 ą124% +1455.2% 2974 ą100% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.path_openat.do_filp_open.do_sys_openat2
> 2.24 ą 48% +144.6% 5.48 ą 29% perf-sched.wait_time.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 8.06 ą101% +618.7% 57.94 ą 68% perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.28 ą146% +370.9% 1.30 ą 64% perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vms_gather_munmap_vmas.do_vmi_align_munmap
> 3.86 ą 5% +86.0% 7.18 ą 44% perf-sched.wait_time.avg.ms.__cond_resched.down_write.free_pgtables.exit_mmap.__mmput
> 2.54 ą 29% +51.6% 3.85 ą 12% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.11 ą 39% +201.6% 3.36 ą 47% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
> 3.60 ą 68% +1630.2% 62.29 ą153% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
> 0.20 ą 64% +115.4% 0.42 ą 42% perf-sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.load_elf_binary.exec_binprm
> 55.51 ą 53% +218.0% 176.54 ą 83% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
> 3.22 ą 3% +15.4% 3.72 ą 9% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
> 0.04 ą223% +37562.8% 14.50 ą198% perf-sched.wait_time.avg.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> 1.19 ą 30% +59.3% 1.90 ą 34% perf-sched.wait_time.avg.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
> 0.36 ą133% +447.8% 1.97 ą 60% perf-sched.wait_time.avg.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> 1.73 ą 47% +165.0% 4.58 ą 51% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> 1.10 ą 21% +693.9% 8.70 ą 70% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
> 54.56 ą 32% +400.2% 272.90 ą 67% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 10.87 ą 18% +151.7% 27.36 ą 68% perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_pte_missing.__handle_mm_fault
> 123.35 ą181% +1219.3% 1627 ą 82% perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 9.68 ą108% +5490.2% 541.19 ą185% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.do_epoll_create.__x64_sys_epoll_create.do_syscall_64
> 3.39 ą 91% +112.0% 7.19 ą 44% perf-sched.wait_time.max.ms.__cond_resched.down_read_killable.iterate_dir.__x64_sys_getdents64.do_syscall_64
> 1.12 ą128% +407.4% 5.67 ą 52% perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vms_gather_munmap_vmas.do_vmi_align_munmap
> 30.58 ą 29% +1741.1% 563.04 ą 80% perf-sched.wait_time.max.ms.__cond_resched.down_write.free_pgtables.exit_mmap.__mmput
> 3.82 ą114% +232.1% 12.70 ą 49% perf-sched.wait_time.max.ms.__cond_resched.down_write.vms_gather_munmap_vmas.__mmap_prepare.__mmap_region
> 7.75 ą 46% +72.4% 13.36 ą 32% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 13.39 ą 48% +259.7% 48.17 ą 34% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
> 12.46 ą 30% +90.5% 23.73 ą 34% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
> 479.90 ą 78% +496.9% 2864 ą141% perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
> 185.75 ą116% +875.0% 1811 ą 84% perf-sched.wait_time.max.ms.__cond_resched.dput.path_openat.do_filp_open.do_sys_openat2
> 2.52 ą 44% +105.2% 5.18 ą 36% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.alloc_bprm.kernel_execve
> 0.04 ą223% +37564.1% 14.50 ą198% perf-sched.wait_time.max.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> 0.54 ą153% +669.7% 4.15 ą 61% perf-sched.wait_time.max.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> 28.22 ą 14% +41.2% 39.84 ą 25% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
> 4.90 ą 51% +158.2% 12.66 ą 75% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> 26.54 ą 66% +5220.1% 1411 ą 84% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
> 1262 ą 9% +434.3% 6744 ą 60% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists