[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHHMvREPjWNvmAa_qQovK-9S1zvCAGh=K6U21oyr4pTtzg@mail.gmail.com>
Date: Thu, 17 Apr 2025 12:02:55 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
Christian Brauner <brauner@...nel.org>, linux-fsdevel@...r.kernel.org
Subject: Re: [linus:master] [fs] a914bd93f3: stress-ng.close.close_calls_per_sec
52.2% regression
On Thu, Apr 17, 2025 at 11:54 AM Mateusz Guzik <mjguzik@...il.com> wrote:
>
> On Thu, Apr 17, 2025 at 10:24 AM kernel test robot
> <oliver.sang@...el.com> wrote:
> >
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 52.2% regression of stress-ng.close.close_calls_per_sec on:
> >
> >
> > commit: a914bd93f3edfedcdd59deb615e8dd1b3643cac5 ("fs: use fput_close() in filp_close()")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [still regression on linus/master 7cdabafc001202de9984f22c973305f424e0a8b7]
> > [still regression on linux-next/master 01c6df60d5d4ae00cd5c1648818744838bba7763]
> >
> > testcase: stress-ng
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
> > parameters:
> >
> > nr_threads: 100%
> > testtime: 60s
> > test: close
> > cpufreq_governor: performance
> >
> >
> > the data is not very stable, but the regression trend seems clear.
> >
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json: "stress-ng.close.close_calls_per_sec": [
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 193538.65,
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 232133.85,
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 276146.66,
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 193345.38,
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 209411.34,
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- 254782.41
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json- ],
> >
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json: "stress-ng.close.close_calls_per_sec": [
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 427893.13,
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 456267.6,
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 509121.02,
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 544289.08,
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 354004.06,
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- 552310.73
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json- ],
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202504171513.6d6f8a16-lkp@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250417/202504171513.6d6f8a16-lkp@intel.com
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> > gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp1/close/stress-ng/60s
> >
> > commit:
> > 3e46a92a27 ("fs: use fput_close_sync() in close()")
> > a914bd93f3 ("fs: use fput_close() in filp_close()")
> >
>
> I'm going to have to chew on it.
>
> First, the commit at hand states:
> fs: use fput_close() in filp_close()
>
> When tracing a kernel build over refcounts seen this is a wash:
> @[kprobe:filp_close]:
> [0] 32195 |@@@@@@@@@@
> |
> [1] 164567
> |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>
> I verified vast majority of the skew comes from do_close_on_exec() which
> could be changed to use a different variant instead.
>
> So if need be this is trivially revertable without a major loss anywhere.
>
> But also:
> Even without changing that, the 19.5% of calls which got here still can
> save the extra atomic. Calls here are borderline non-existent compared
> to fput (over 3.2 mln!), so they should not negatively affect
> scalability.
>
> As the code converts a lock xadd into a lock cmpxchg loop in exchange
> for saving an atomic if this is the last close. This indeed scales
> worse if this in fact does not close the file.
>
> stress-ng dups fd 2 several times and then closes it in close_range()
> in a loop. this manufactures a case where this is never the last close
> and the cmpxchg loop is a plain loss.
>
huh, bad editing
bottom line though is there is a known tradeoff there and stress-ng
manufactures a case where it is always the wrong one.
fd 2 at hand is inherited (it's the tty) and shared between *all*
workers on all CPUs.
Ignoring some fluff, it's this in a loop:
dup2(2, 1024) = 1024
dup2(2, 1025) = 1025
dup2(2, 1026) = 1026
dup2(2, 1027) = 1027
dup2(2, 1028) = 1028
dup2(2, 1029) = 1029
dup2(2, 1030) = 1030
dup2(2, 1031) = 1031
[..]
close_range(1024, 1032, 0) = 0
where fd 2 is the same file object in all 192 workers doing this.
> While I don't believe this represents real-world behavior, maybe
> something can be done. I note again this patch can be whacked
> altogether without any tears shed.
>
> So I'm going to chew on it.
>
> > 3e46a92a27c2927f a914bd93f3edfedcdd59deb615e
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 355470 ą 14% -17.6% 292767 ą 7% cpuidle..usage
> > 5.19 -0.6 4.61 ą 3% mpstat.cpu.all.usr%
> > 495615 ą 7% -38.5% 304839 ą 8% vmstat.system.cs
> > 780096 ą 5% -23.5% 596446 ą 3% vmstat.system.in
> > 2512168 ą 17% -45.8% 1361659 ą 71% sched_debug.cfs_rq:/.avg_vruntime.min
> > 2512168 ą 17% -45.8% 1361659 ą 71% sched_debug.cfs_rq:/.min_vruntime.min
> > 700402 ą 2% +19.8% 838744 ą 10% sched_debug.cpu.avg_idle.avg
> > 81230 ą 6% -59.6% 32788 ą 69% sched_debug.cpu.nr_switches.avg
> > 27992 ą 20% -70.2% 8345 ą 74% sched_debug.cpu.nr_switches.min
> > 473980 ą 14% -52.2% 226559 ą 13% stress-ng.close.close_calls_per_sec
> > 4004843 ą 8% -21.1% 3161813 ą 8% stress-ng.time.involuntary_context_switches
> > 9475 +1.1% 9582 stress-ng.time.system_time
> > 183.50 ą 2% -37.8% 114.20 ą 3% stress-ng.time.user_time
> > 25637892 ą 6% -42.6% 14725385 ą 9% stress-ng.time.voluntary_context_switches
> > 23.01 ą 2% -1.4 21.61 ą 3% perf-stat.i.cache-miss-rate%
> > 17981659 -10.8% 16035508 ą 4% perf-stat.i.cache-misses
> > 77288888 ą 2% -6.5% 72260357 ą 4% perf-stat.i.cache-references
> > 504949 ą 6% -38.1% 312536 ą 8% perf-stat.i.context-switches
> > 33030 +15.7% 38205 ą 4% perf-stat.i.cycles-between-cache-misses
> > 4.34 ą 10% -38.3% 2.68 ą 20% perf-stat.i.metric.K/sec
> > 26229 ą 44% +37.8% 36145 ą 4% perf-stat.overall.cycles-between-cache-misses
> > 30.09 -12.8 17.32 perf-profile.calltrace.cycles-pp.filp_flush.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 9.32 -9.3 0.00 perf-profile.calltrace.cycles-pp.fput.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 41.15 -5.9 35.26 perf-profile.calltrace.cycles-pp.__dup2
> > 41.02 -5.9 35.17 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> > 41.03 -5.8 35.18 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__dup2
> > 13.86 -5.5 8.35 perf-profile.calltrace.cycles-pp.filp_flush.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64
> > 40.21 -5.5 34.71 perf-profile.calltrace.cycles-pp.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> > 38.01 -4.9 33.10 perf-profile.calltrace.cycles-pp.do_dup2.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> > 4.90 ą 2% -1.9 2.96 ą 3% perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
> > 2.63 ą 9% -1.9 0.76 ą 18% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
> > 4.44 -1.7 2.76 ą 2% perf-profile.calltrace.cycles-pp.dnotify_flush.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
> > 2.50 ą 2% -0.9 1.56 ą 3% perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_flush.filp_close.do_dup2.__x64_sys_dup2
> > 2.22 -0.8 1.42 ą 2% perf-profile.calltrace.cycles-pp.dnotify_flush.filp_flush.filp_close.do_dup2.__x64_sys_dup2
> > 1.66 ą 5% -0.3 1.37 ą 5% perf-profile.calltrace.cycles-pp.ksys_dup3.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> > 1.60 ą 5% -0.3 1.32 ą 5% perf-profile.calltrace.cycles-pp._raw_spin_lock.ksys_dup3.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.58 ą 2% +0.0 0.62 ą 2% perf-profile.calltrace.cycles-pp.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> > 0.54 ą 2% +0.0 0.58 ą 2% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault
> > 0.72 ą 4% +0.1 0.79 ą 4% perf-profile.calltrace.cycles-pp._raw_spin_lock.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> > 0.00 +1.8 1.79 ą 5% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.fput_close.filp_close.__do_sys_close_range.do_syscall_64
> > 19.11 +4.4 23.49 perf-profile.calltrace.cycles-pp.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 40.94 +6.5 47.46 perf-profile.calltrace.cycles-pp.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> > 42.24 +6.6 48.79 perf-profile.calltrace.cycles-pp.syscall
> > 42.14 +6.6 48.72 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
> > 42.14 +6.6 48.72 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> > 41.86 +6.6 48.46 perf-profile.calltrace.cycles-pp.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> > 0.00 +14.6 14.60 ą 2% perf-profile.calltrace.cycles-pp.fput_close.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64
> > 0.00 +29.0 28.95 ą 2% perf-profile.calltrace.cycles-pp.fput_close.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 45.76 -19.0 26.72 perf-profile.children.cycles-pp.filp_flush
> > 14.67 -14.7 0.00 perf-profile.children.cycles-pp.fput
> > 41.20 -5.9 35.30 perf-profile.children.cycles-pp.__dup2
> > 40.21 -5.5 34.71 perf-profile.children.cycles-pp.__x64_sys_dup2
> > 38.53 -5.2 33.33 perf-profile.children.cycles-pp.do_dup2
> > 7.81 ą 2% -3.1 4.74 ą 3% perf-profile.children.cycles-pp.locks_remove_posix
> > 7.03 -2.6 4.40 ą 2% perf-profile.children.cycles-pp.dnotify_flush
> > 1.24 ą 3% -0.4 0.86 ą 4% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> > 1.67 ą 5% -0.3 1.37 ą 5% perf-profile.children.cycles-pp.ksys_dup3
> > 0.29 -0.1 0.22 ą 2% perf-profile.children.cycles-pp.update_load_avg
> > 0.10 ą 10% -0.1 0.05 ą 46% perf-profile.children.cycles-pp.__x64_sys_fcntl
> > 0.17 ą 7% -0.1 0.12 ą 4% perf-profile.children.cycles-pp.entry_SYSCALL_64
> > 0.15 ą 3% -0.0 0.10 ą 18% perf-profile.children.cycles-pp.clockevents_program_event
> > 0.06 ą 11% -0.0 0.02 ą 99% perf-profile.children.cycles-pp.stress_close_func
> > 0.13 ą 5% -0.0 0.10 ą 7% perf-profile.children.cycles-pp.__switch_to
> > 0.06 ą 7% -0.0 0.04 ą 71% perf-profile.children.cycles-pp.lapic_next_deadline
> > 0.11 ą 3% -0.0 0.09 ą 5% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> > 0.06 ą 6% +0.0 0.08 ą 6% perf-profile.children.cycles-pp.__folio_batch_add_and_move
> > 0.26 ą 2% +0.0 0.28 ą 3% perf-profile.children.cycles-pp.folio_remove_rmap_ptes
> > 0.08 ą 4% +0.0 0.11 ą 10% perf-profile.children.cycles-pp.set_pte_range
> > 0.45 ą 2% +0.0 0.48 ą 2% perf-profile.children.cycles-pp.zap_present_ptes
> > 0.18 ą 3% +0.0 0.21 ą 4% perf-profile.children.cycles-pp.folios_put_refs
> > 0.29 ą 3% +0.0 0.32 ą 3% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
> > 0.29 ą 3% +0.0 0.32 ą 3% perf-profile.children.cycles-pp.free_pages_and_swap_cache
> > 0.34 ą 4% +0.0 0.38 ą 3% perf-profile.children.cycles-pp.tlb_finish_mmu
> > 0.24 ą 3% +0.0 0.27 ą 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> > 0.73 ą 2% +0.0 0.78 ą 2% perf-profile.children.cycles-pp.do_read_fault
> > 0.69 ą 2% +0.0 0.74 ą 2% perf-profile.children.cycles-pp.filemap_map_pages
> > 42.26 +6.5 48.80 perf-profile.children.cycles-pp.syscall
> > 41.86 +6.6 48.46 perf-profile.children.cycles-pp.__do_sys_close_range
> > 60.05 +10.9 70.95 perf-profile.children.cycles-pp.filp_close
> > 0.00 +44.5 44.55 ą 2% perf-profile.children.cycles-pp.fput_close
> > 14.31 ą 2% -14.3 0.00 perf-profile.self.cycles-pp.fput
> > 30.12 ą 2% -13.0 17.09 perf-profile.self.cycles-pp.filp_flush
> > 19.17 ą 2% -9.5 9.68 ą 3% perf-profile.self.cycles-pp.do_dup2
> > 7.63 ą 3% -3.0 4.62 ą 4% perf-profile.self.cycles-pp.locks_remove_posix
> > 6.86 ą 3% -2.6 4.28 ą 3% perf-profile.self.cycles-pp.dnotify_flush
> > 0.64 ą 4% -0.4 0.26 ą 4% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> > 0.22 ą 10% -0.1 0.15 ą 11% perf-profile.self.cycles-pp.x64_sys_call
> > 0.23 ą 3% -0.1 0.17 ą 8% perf-profile.self.cycles-pp.__schedule
> > 0.08 ą 19% -0.1 0.03 ą100% perf-profile.self.cycles-pp.pick_eevdf
> > 0.06 ą 7% -0.0 0.03 ą100% perf-profile.self.cycles-pp.lapic_next_deadline
> > 0.13 ą 7% -0.0 0.10 ą 10% perf-profile.self.cycles-pp.__switch_to
> > 0.09 ą 8% -0.0 0.06 ą 8% perf-profile.self.cycles-pp.__update_load_avg_se
> > 0.08 ą 4% -0.0 0.05 ą 7% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
> > 0.08 ą 9% -0.0 0.06 ą 11% perf-profile.self.cycles-pp.entry_SYSCALL_64
> > 0.06 ą 7% +0.0 0.08 ą 8% perf-profile.self.cycles-pp.filemap_map_pages
> > 0.12 ą 3% +0.0 0.14 ą 4% perf-profile.self.cycles-pp.folios_put_refs
> > 0.25 ą 2% +0.0 0.27 ą 3% perf-profile.self.cycles-pp.folio_remove_rmap_ptes
> > 0.17 ą 5% +0.0 0.20 ą 6% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> > 0.00 +0.1 0.07 ą 5% perf-profile.self.cycles-pp.do_nanosleep
> > 0.00 +0.1 0.10 ą 15% perf-profile.self.cycles-pp.filp_close
> > 0.00 +43.9 43.86 ą 2% perf-profile.self.cycles-pp.fput_close
> > 2.12 ą 47% +151.1% 5.32 ą 28% perf-sched.sch_delay.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> > 0.28 ą 60% +990.5% 3.08 ą 53% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.do_eventfd.__x64_sys_eventfd2.do_syscall_64
> > 0.96 ą 28% +35.5% 1.30 ą 25% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_noprof.load_elf_phdrs.load_elf_binary.exec_binprm
> > 2.56 ą 43% +964.0% 27.27 ą161% perf-sched.sch_delay.avg.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
> > 12.80 ą139% +366.1% 59.66 ą107% perf-sched.sch_delay.avg.ms.__cond_resched.dput.shmem_unlink.vfs_unlink.do_unlinkat
> > 2.78 ą 25% +50.0% 4.17 ą 19% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_cursor.dcache_dir_open
> > 0.78 ą 18% +39.3% 1.09 ą 19% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_shrink
> > 2.63 ą 10% +16.7% 3.07 ą 4% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
> > 0.04 ą223% +3483.1% 1.38 ą 67% perf-sched.sch_delay.avg.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> > 0.72 ą 17% +42.5% 1.02 ą 22% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
> > 0.35 ą134% +457.7% 1.94 ą 61% perf-sched.sch_delay.avg.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> > 1.88 ą 34% +574.9% 12.69 ą115% perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 2.41 ą 10% +18.0% 2.84 ą 9% perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> > 1.85 ą 36% +155.6% 4.73 ą 50% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> > 28.94 ą 26% -52.4% 13.78 ą 36% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
> > 2.24 ą 9% +22.1% 2.74 ą 8% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_file_vma_batch_final
> > 2.19 ą 7% +17.3% 2.57 ą 6% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
> > 2.39 ą 6% +16.4% 2.79 ą 9% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_prepare
> > 0.34 ą 77% +1931.5% 6.95 ą 99% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.do_eventfd.__x64_sys_eventfd2.do_syscall_64
> > 29.69 ą 28% +129.5% 68.12 ą 28% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
> > 1.48 ą 96% +284.6% 5.68 ą 56% perf-sched.sch_delay.max.ms.__cond_resched.copy_strings_kernel.kernel_execve.call_usermodehelper_exec_async.ret_from_fork
> > 3.59 ą 57% +124.5% 8.06 ą 45% perf-sched.sch_delay.max.ms.__cond_resched.down_read.mmap_read_lock_maybe_expand.get_arg_page.copy_string_kernel
> > 3.39 ą 91% +112.0% 7.19 ą 44% perf-sched.sch_delay.max.ms.__cond_resched.down_read_killable.iterate_dir.__x64_sys_getdents64.do_syscall_64
> > 22.16 ą 77% +117.4% 48.17 ą 34% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
> > 8.72 ą 17% +358.5% 39.98 ą 61% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
> > 0.04 ą223% +3484.4% 1.38 ą 67% perf-sched.sch_delay.max.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> > 0.53 ą154% +676.1% 4.12 ą 61% perf-sched.sch_delay.max.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> > 25.76 ą 70% +9588.9% 2496 ą127% perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 51.53 ą 26% -58.8% 21.22 ą106% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> > 4.97 ą 48% +154.8% 12.66 ą 75% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> > 4.36 ą 48% +147.7% 10.81 ą 29% perf-sched.wait_and_delay.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> > 108632 ą 4% +23.9% 134575 ą 6% perf-sched.wait_and_delay.count.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> > 572.67 ą 6% +37.3% 786.17 ą 7% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> > 596.67 ą 12% +43.9% 858.50 ą 13% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.wait_for_completion_state.call_usermodehelper_exec.__request_module
> > 294.83 ą 9% +31.1% 386.50 ą 11% perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
> > 1223275 ą 2% -17.7% 1006293 ą 6% perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
> > 2772 ą 11% +43.6% 3980 ą 11% perf-sched.wait_and_delay.count.do_wait.kernel_wait.call_usermodehelper_exec_work.process_one_work
> > 11690 ą 7% +29.8% 15173 ą 10% perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
> > 4072 ą100% +163.7% 10737 ą 7% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
> > 8811 ą 6% +26.2% 11117 ą 9% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> > 662.17 ą 29% +187.8% 1905 ą 29% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
> > 15.50 ą 11% +48.4% 23.00 ą 16% perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
> > 167.67 ą 20% +48.0% 248.17 ą 26% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
> > 2680 ą 12% +42.5% 3820 ą 11% perf-sched.wait_and_delay.count.schedule_timeout.___down_common.__down_timeout.down_timeout
> > 137.17 ą 13% +31.8% 180.83 ą 9% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
> > 2636 ą 12% +43.1% 3772 ą 12% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.call_usermodehelper_exec
> > 10.50 ą 11% +74.6% 18.33 ą 23% perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
> > 5619 ą 5% +38.8% 7797 ą 9% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> > 70455 ą 4% +32.3% 93197 ą 6% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> > 6990 ą 4% +37.4% 9603 ą 9% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> > 191.28 ą124% +1455.2% 2974 ą100% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.path_openat.do_filp_open.do_sys_openat2
> > 2.24 ą 48% +144.6% 5.48 ą 29% perf-sched.wait_time.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> > 8.06 ą101% +618.7% 57.94 ą 68% perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.28 ą146% +370.9% 1.30 ą 64% perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vms_gather_munmap_vmas.do_vmi_align_munmap
> > 3.86 ą 5% +86.0% 7.18 ą 44% perf-sched.wait_time.avg.ms.__cond_resched.down_write.free_pgtables.exit_mmap.__mmput
> > 2.54 ą 29% +51.6% 3.85 ą 12% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 1.11 ą 39% +201.6% 3.36 ą 47% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
> > 3.60 ą 68% +1630.2% 62.29 ą153% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
> > 0.20 ą 64% +115.4% 0.42 ą 42% perf-sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.load_elf_binary.exec_binprm
> > 55.51 ą 53% +218.0% 176.54 ą 83% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
> > 3.22 ą 3% +15.4% 3.72 ą 9% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
> > 0.04 ą223% +37562.8% 14.50 ą198% perf-sched.wait_time.avg.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> > 1.19 ą 30% +59.3% 1.90 ą 34% perf-sched.wait_time.avg.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
> > 0.36 ą133% +447.8% 1.97 ą 60% perf-sched.wait_time.avg.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> > 1.73 ą 47% +165.0% 4.58 ą 51% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> > 1.10 ą 21% +693.9% 8.70 ą 70% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
> > 54.56 ą 32% +400.2% 272.90 ą 67% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> > 10.87 ą 18% +151.7% 27.36 ą 68% perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_pte_missing.__handle_mm_fault
> > 123.35 ą181% +1219.3% 1627 ą 82% perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 9.68 ą108% +5490.2% 541.19 ą185% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.do_epoll_create.__x64_sys_epoll_create.do_syscall_64
> > 3.39 ą 91% +112.0% 7.19 ą 44% perf-sched.wait_time.max.ms.__cond_resched.down_read_killable.iterate_dir.__x64_sys_getdents64.do_syscall_64
> > 1.12 ą128% +407.4% 5.67 ą 52% perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vms_gather_munmap_vmas.do_vmi_align_munmap
> > 30.58 ą 29% +1741.1% 563.04 ą 80% perf-sched.wait_time.max.ms.__cond_resched.down_write.free_pgtables.exit_mmap.__mmput
> > 3.82 ą114% +232.1% 12.70 ą 49% perf-sched.wait_time.max.ms.__cond_resched.down_write.vms_gather_munmap_vmas.__mmap_prepare.__mmap_region
> > 7.75 ą 46% +72.4% 13.36 ą 32% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 13.39 ą 48% +259.7% 48.17 ą 34% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
> > 12.46 ą 30% +90.5% 23.73 ą 34% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
> > 479.90 ą 78% +496.9% 2864 ą141% perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
> > 185.75 ą116% +875.0% 1811 ą 84% perf-sched.wait_time.max.ms.__cond_resched.dput.path_openat.do_filp_open.do_sys_openat2
> > 2.52 ą 44% +105.2% 5.18 ą 36% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.alloc_bprm.kernel_execve
> > 0.04 ą223% +37564.1% 14.50 ą198% perf-sched.wait_time.max.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> > 0.54 ą153% +669.7% 4.15 ą 61% perf-sched.wait_time.max.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> > 28.22 ą 14% +41.2% 39.84 ą 25% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
> > 4.90 ą 51% +158.2% 12.66 ą 75% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> > 26.54 ą 66% +5220.1% 1411 ą 84% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
> > 1262 ą 9% +434.3% 6744 ą 60% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> >
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> >
>
>
> --
> Mateusz Guzik <mjguzik gmail.com>
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists