lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHHMvREPjWNvmAa_qQovK-9S1zvCAGh=K6U21oyr4pTtzg@mail.gmail.com>
Date: Thu, 17 Apr 2025 12:02:55 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org, 
	Christian Brauner <brauner@...nel.org>, linux-fsdevel@...r.kernel.org
Subject: Re: [linus:master] [fs] a914bd93f3: stress-ng.close.close_calls_per_sec
 52.2% regression

On Thu, Apr 17, 2025 at 11:54 AM Mateusz Guzik <mjguzik@...il.com> wrote:
>
> On Thu, Apr 17, 2025 at 10:24 AM kernel test robot
> <oliver.sang@...el.com> wrote:
> >
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 52.2% regression of stress-ng.close.close_calls_per_sec on:
> >
> >
> > commit: a914bd93f3edfedcdd59deb615e8dd1b3643cac5 ("fs: use fput_close() in filp_close()")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [still regression on linus/master      7cdabafc001202de9984f22c973305f424e0a8b7]
> > [still regression on linux-next/master 01c6df60d5d4ae00cd5c1648818744838bba7763]
> >
> > testcase: stress-ng
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 192 threads 2 sockets Intel(R) Xeon(R) Platinum 8468V  CPU @ 2.4GHz (Sapphire Rapids) with 384G memory
> > parameters:
> >
> >         nr_threads: 100%
> >         testtime: 60s
> >         test: close
> >         cpufreq_governor: performance
> >
> >
> > the data is not very stable, but the regression trend seems clear.
> >
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json:  "stress-ng.close.close_calls_per_sec": [
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json-    193538.65,
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json-    232133.85,
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json-    276146.66,
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json-    193345.38,
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json-    209411.34,
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json-    254782.41
> > a914bd93f3edfedcdd59deb615e8dd1b3643cac5/matrix.json-  ],
> >
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json:  "stress-ng.close.close_calls_per_sec": [
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json-    427893.13,
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json-    456267.6,
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json-    509121.02,
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json-    544289.08,
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json-    354004.06,
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json-    552310.73
> > 3e46a92a27c2927fcef996ba06cbe299da629c28/matrix.json-  ],
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@...el.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202504171513.6d6f8a16-lkp@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250417/202504171513.6d6f8a16-lkp@intel.com
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> >   gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/igk-spr-2sp1/close/stress-ng/60s
> >
> > commit:
> >   3e46a92a27 ("fs: use fput_close_sync() in close()")
> >   a914bd93f3 ("fs: use fput_close() in filp_close()")
> >
>
> I'm going to have to chew on it.
>
> First, the commit at hand states:
>     fs: use fput_close() in filp_close()
>
>     When tracing a kernel build over refcounts seen this is a wash:
>     @[kprobe:filp_close]:
>     [0]                32195 |@@@@@@@@@@
>            |
>     [1]               164567
> |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
>
>     I verified vast majority of the skew comes from do_close_on_exec() which
>     could be changed to use a different variant instead.
>
> So if need be this is trivially revertable without a major loss anywhere.
>
> But also:
>     Even without changing that, the 19.5% of calls which got here still can
>     save the extra atomic. Calls here are borderline non-existent compared
>     to fput (over 3.2 mln!), so they should not negatively affect
>     scalability.
>
> As the code converts a lock xadd into a lock cmpxchg loop in exchange
> for saving an atomic if this is the last close. This indeed scales
> worse if this in fact does not close the file.
>
> stress-ng dups fd 2 several times and then closes it in close_range()
> in a loop. this manufactures a case where this is never the last close
> and the cmpxchg loop is a plain loss.
>

huh, bad editing

bottom line though is there is a known tradeoff there and stress-ng
manufactures a case where it is always the wrong one.

fd 2 at hand is inherited (it's the tty) and shared between *all*
workers on all CPUs.

Ignoring some fluff, it's this in a loop:
dup2(2, 1024)                           = 1024
dup2(2, 1025)                           = 1025
dup2(2, 1026)                           = 1026
dup2(2, 1027)                           = 1027
dup2(2, 1028)                           = 1028
dup2(2, 1029)                           = 1029
dup2(2, 1030)                           = 1030
dup2(2, 1031)                           = 1031
[..]
close_range(1024, 1032, 0)              = 0

where fd 2 is the same file object in all 192 workers doing this.

> While I don't believe this represents real-world behavior, maybe
> something can be done. I note again this patch can be whacked
> altogether without any tears shed.
>
> So I'm going to chew on it.
>
> > 3e46a92a27c2927f a914bd93f3edfedcdd59deb615e
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >     355470 ą 14%     -17.6%     292767 ą  7%  cpuidle..usage
> >       5.19            -0.6        4.61 ą  3%  mpstat.cpu.all.usr%
> >     495615 ą  7%     -38.5%     304839 ą  8%  vmstat.system.cs
> >     780096 ą  5%     -23.5%     596446 ą  3%  vmstat.system.in
> >    2512168 ą 17%     -45.8%    1361659 ą 71%  sched_debug.cfs_rq:/.avg_vruntime.min
> >    2512168 ą 17%     -45.8%    1361659 ą 71%  sched_debug.cfs_rq:/.min_vruntime.min
> >     700402 ą  2%     +19.8%     838744 ą 10%  sched_debug.cpu.avg_idle.avg
> >      81230 ą  6%     -59.6%      32788 ą 69%  sched_debug.cpu.nr_switches.avg
> >      27992 ą 20%     -70.2%       8345 ą 74%  sched_debug.cpu.nr_switches.min
> >     473980 ą 14%     -52.2%     226559 ą 13%  stress-ng.close.close_calls_per_sec
> >    4004843 ą  8%     -21.1%    3161813 ą  8%  stress-ng.time.involuntary_context_switches
> >       9475            +1.1%       9582        stress-ng.time.system_time
> >     183.50 ą  2%     -37.8%     114.20 ą  3%  stress-ng.time.user_time
> >   25637892 ą  6%     -42.6%   14725385 ą  9%  stress-ng.time.voluntary_context_switches
> >      23.01 ą  2%      -1.4       21.61 ą  3%  perf-stat.i.cache-miss-rate%
> >   17981659           -10.8%   16035508 ą  4%  perf-stat.i.cache-misses
> >   77288888 ą  2%      -6.5%   72260357 ą  4%  perf-stat.i.cache-references
> >     504949 ą  6%     -38.1%     312536 ą  8%  perf-stat.i.context-switches
> >      33030           +15.7%      38205 ą  4%  perf-stat.i.cycles-between-cache-misses
> >       4.34 ą 10%     -38.3%       2.68 ą 20%  perf-stat.i.metric.K/sec
> >      26229 ą 44%     +37.8%      36145 ą  4%  perf-stat.overall.cycles-between-cache-misses
> >      30.09           -12.8       17.32        perf-profile.calltrace.cycles-pp.filp_flush.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       9.32            -9.3        0.00        perf-profile.calltrace.cycles-pp.fput.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      41.15            -5.9       35.26        perf-profile.calltrace.cycles-pp.__dup2
> >      41.02            -5.9       35.17        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> >      41.03            -5.8       35.18        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__dup2
> >      13.86            -5.5        8.35        perf-profile.calltrace.cycles-pp.filp_flush.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64
> >      40.21            -5.5       34.71        perf-profile.calltrace.cycles-pp.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> >      38.01            -4.9       33.10        perf-profile.calltrace.cycles-pp.do_dup2.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> >       4.90 ą  2%      -1.9        2.96 ą  3%  perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
> >       2.63 ą  9%      -1.9        0.76 ą 18%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
> >       4.44            -1.7        2.76 ą  2%  perf-profile.calltrace.cycles-pp.dnotify_flush.filp_flush.filp_close.__do_sys_close_range.do_syscall_64
> >       2.50 ą  2%      -0.9        1.56 ą  3%  perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_flush.filp_close.do_dup2.__x64_sys_dup2
> >       2.22            -0.8        1.42 ą  2%  perf-profile.calltrace.cycles-pp.dnotify_flush.filp_flush.filp_close.do_dup2.__x64_sys_dup2
> >       1.66 ą  5%      -0.3        1.37 ą  5%  perf-profile.calltrace.cycles-pp.ksys_dup3.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe.__dup2
> >       1.60 ą  5%      -0.3        1.32 ą  5%  perf-profile.calltrace.cycles-pp._raw_spin_lock.ksys_dup3.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.58 ą  2%      +0.0        0.62 ą  2%  perf-profile.calltrace.cycles-pp.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> >       0.54 ą  2%      +0.0        0.58 ą  2%  perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_pte_missing.__handle_mm_fault.handle_mm_fault
> >       0.72 ą  4%      +0.1        0.79 ą  4%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> >       0.00            +1.8        1.79 ą  5%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.fput_close.filp_close.__do_sys_close_range.do_syscall_64
> >      19.11            +4.4       23.49        perf-profile.calltrace.cycles-pp.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      40.94            +6.5       47.46        perf-profile.calltrace.cycles-pp.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> >      42.24            +6.6       48.79        perf-profile.calltrace.cycles-pp.syscall
> >      42.14            +6.6       48.72        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
> >      42.14            +6.6       48.72        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> >      41.86            +6.6       48.46        perf-profile.calltrace.cycles-pp.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
> >       0.00           +14.6       14.60 ą  2%  perf-profile.calltrace.cycles-pp.fput_close.filp_close.do_dup2.__x64_sys_dup2.do_syscall_64
> >       0.00           +29.0       28.95 ą  2%  perf-profile.calltrace.cycles-pp.fput_close.filp_close.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      45.76           -19.0       26.72        perf-profile.children.cycles-pp.filp_flush
> >      14.67           -14.7        0.00        perf-profile.children.cycles-pp.fput
> >      41.20            -5.9       35.30        perf-profile.children.cycles-pp.__dup2
> >      40.21            -5.5       34.71        perf-profile.children.cycles-pp.__x64_sys_dup2
> >      38.53            -5.2       33.33        perf-profile.children.cycles-pp.do_dup2
> >       7.81 ą  2%      -3.1        4.74 ą  3%  perf-profile.children.cycles-pp.locks_remove_posix
> >       7.03            -2.6        4.40 ą  2%  perf-profile.children.cycles-pp.dnotify_flush
> >       1.24 ą  3%      -0.4        0.86 ą  4%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> >       1.67 ą  5%      -0.3        1.37 ą  5%  perf-profile.children.cycles-pp.ksys_dup3
> >       0.29            -0.1        0.22 ą  2%  perf-profile.children.cycles-pp.update_load_avg
> >       0.10 ą 10%      -0.1        0.05 ą 46%  perf-profile.children.cycles-pp.__x64_sys_fcntl
> >       0.17 ą  7%      -0.1        0.12 ą  4%  perf-profile.children.cycles-pp.entry_SYSCALL_64
> >       0.15 ą  3%      -0.0        0.10 ą 18%  perf-profile.children.cycles-pp.clockevents_program_event
> >       0.06 ą 11%      -0.0        0.02 ą 99%  perf-profile.children.cycles-pp.stress_close_func
> >       0.13 ą  5%      -0.0        0.10 ą  7%  perf-profile.children.cycles-pp.__switch_to
> >       0.06 ą  7%      -0.0        0.04 ą 71%  perf-profile.children.cycles-pp.lapic_next_deadline
> >       0.11 ą  3%      -0.0        0.09 ą  5%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> >       0.06 ą  6%      +0.0        0.08 ą  6%  perf-profile.children.cycles-pp.__folio_batch_add_and_move
> >       0.26 ą  2%      +0.0        0.28 ą  3%  perf-profile.children.cycles-pp.folio_remove_rmap_ptes
> >       0.08 ą  4%      +0.0        0.11 ą 10%  perf-profile.children.cycles-pp.set_pte_range
> >       0.45 ą  2%      +0.0        0.48 ą  2%  perf-profile.children.cycles-pp.zap_present_ptes
> >       0.18 ą  3%      +0.0        0.21 ą  4%  perf-profile.children.cycles-pp.folios_put_refs
> >       0.29 ą  3%      +0.0        0.32 ą  3%  perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
> >       0.29 ą  3%      +0.0        0.32 ą  3%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
> >       0.34 ą  4%      +0.0        0.38 ą  3%  perf-profile.children.cycles-pp.tlb_finish_mmu
> >       0.24 ą  3%      +0.0        0.27 ą  5%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> >       0.73 ą  2%      +0.0        0.78 ą  2%  perf-profile.children.cycles-pp.do_read_fault
> >       0.69 ą  2%      +0.0        0.74 ą  2%  perf-profile.children.cycles-pp.filemap_map_pages
> >      42.26            +6.5       48.80        perf-profile.children.cycles-pp.syscall
> >      41.86            +6.6       48.46        perf-profile.children.cycles-pp.__do_sys_close_range
> >      60.05           +10.9       70.95        perf-profile.children.cycles-pp.filp_close
> >       0.00           +44.5       44.55 ą  2%  perf-profile.children.cycles-pp.fput_close
> >      14.31 ą  2%     -14.3        0.00        perf-profile.self.cycles-pp.fput
> >      30.12 ą  2%     -13.0       17.09        perf-profile.self.cycles-pp.filp_flush
> >      19.17 ą  2%      -9.5        9.68 ą  3%  perf-profile.self.cycles-pp.do_dup2
> >       7.63 ą  3%      -3.0        4.62 ą  4%  perf-profile.self.cycles-pp.locks_remove_posix
> >       6.86 ą  3%      -2.6        4.28 ą  3%  perf-profile.self.cycles-pp.dnotify_flush
> >       0.64 ą  4%      -0.4        0.26 ą  4%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> >       0.22 ą 10%      -0.1        0.15 ą 11%  perf-profile.self.cycles-pp.x64_sys_call
> >       0.23 ą  3%      -0.1        0.17 ą  8%  perf-profile.self.cycles-pp.__schedule
> >       0.08 ą 19%      -0.1        0.03 ą100%  perf-profile.self.cycles-pp.pick_eevdf
> >       0.06 ą  7%      -0.0        0.03 ą100%  perf-profile.self.cycles-pp.lapic_next_deadline
> >       0.13 ą  7%      -0.0        0.10 ą 10%  perf-profile.self.cycles-pp.__switch_to
> >       0.09 ą  8%      -0.0        0.06 ą  8%  perf-profile.self.cycles-pp.__update_load_avg_se
> >       0.08 ą  4%      -0.0        0.05 ą  7%  perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
> >       0.08 ą  9%      -0.0        0.06 ą 11%  perf-profile.self.cycles-pp.entry_SYSCALL_64
> >       0.06 ą  7%      +0.0        0.08 ą  8%  perf-profile.self.cycles-pp.filemap_map_pages
> >       0.12 ą  3%      +0.0        0.14 ą  4%  perf-profile.self.cycles-pp.folios_put_refs
> >       0.25 ą  2%      +0.0        0.27 ą  3%  perf-profile.self.cycles-pp.folio_remove_rmap_ptes
> >       0.17 ą  5%      +0.0        0.20 ą  6%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> >       0.00            +0.1        0.07 ą  5%  perf-profile.self.cycles-pp.do_nanosleep
> >       0.00            +0.1        0.10 ą 15%  perf-profile.self.cycles-pp.filp_close
> >       0.00           +43.9       43.86 ą  2%  perf-profile.self.cycles-pp.fput_close
> >       2.12 ą 47%    +151.1%       5.32 ą 28%  perf-sched.sch_delay.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> >       0.28 ą 60%    +990.5%       3.08 ą 53%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.do_eventfd.__x64_sys_eventfd2.do_syscall_64
> >       0.96 ą 28%     +35.5%       1.30 ą 25%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_noprof.load_elf_phdrs.load_elf_binary.exec_binprm
> >       2.56 ą 43%    +964.0%      27.27 ą161%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
> >      12.80 ą139%    +366.1%      59.66 ą107%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.shmem_unlink.vfs_unlink.do_unlinkat
> >       2.78 ą 25%     +50.0%       4.17 ą 19%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_cursor.dcache_dir_open
> >       0.78 ą 18%     +39.3%       1.09 ą 19%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_shrink
> >       2.63 ą 10%     +16.7%       3.07 ą  4%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
> >       0.04 ą223%   +3483.1%       1.38 ą 67%  perf-sched.sch_delay.avg.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> >       0.72 ą 17%     +42.5%       1.02 ą 22%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
> >       0.35 ą134%    +457.7%       1.94 ą 61%  perf-sched.sch_delay.avg.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> >       1.88 ą 34%    +574.9%      12.69 ą115%  perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       2.41 ą 10%     +18.0%       2.84 ą  9%  perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> >       1.85 ą 36%    +155.6%       4.73 ą 50%  perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> >      28.94 ą 26%     -52.4%      13.78 ą 36%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
> >       2.24 ą  9%     +22.1%       2.74 ą  8%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_file_vma_batch_final
> >       2.19 ą  7%     +17.3%       2.57 ą  6%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_link_file
> >       2.39 ą  6%     +16.4%       2.79 ą  9%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.vma_prepare
> >       0.34 ą 77%   +1931.5%       6.95 ą 99%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.do_eventfd.__x64_sys_eventfd2.do_syscall_64
> >      29.69 ą 28%    +129.5%      68.12 ą 28%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
> >       1.48 ą 96%    +284.6%       5.68 ą 56%  perf-sched.sch_delay.max.ms.__cond_resched.copy_strings_kernel.kernel_execve.call_usermodehelper_exec_async.ret_from_fork
> >       3.59 ą 57%    +124.5%       8.06 ą 45%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.mmap_read_lock_maybe_expand.get_arg_page.copy_string_kernel
> >       3.39 ą 91%    +112.0%       7.19 ą 44%  perf-sched.sch_delay.max.ms.__cond_resched.down_read_killable.iterate_dir.__x64_sys_getdents64.do_syscall_64
> >      22.16 ą 77%    +117.4%      48.17 ą 34%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
> >       8.72 ą 17%    +358.5%      39.98 ą 61%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
> >       0.04 ą223%   +3484.4%       1.38 ą 67%  perf-sched.sch_delay.max.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> >       0.53 ą154%    +676.1%       4.12 ą 61%  perf-sched.sch_delay.max.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> >      25.76 ą 70%   +9588.9%       2496 ą127%  perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      51.53 ą 26%     -58.8%      21.22 ą106%  perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> >       4.97 ą 48%    +154.8%      12.66 ą 75%  perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> >       4.36 ą 48%    +147.7%      10.81 ą 29%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> >     108632 ą  4%     +23.9%     134575 ą  6%  perf-sched.wait_and_delay.count.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> >     572.67 ą  6%     +37.3%     786.17 ą  7%  perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> >     596.67 ą 12%     +43.9%     858.50 ą 13%  perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.wait_for_completion_state.call_usermodehelper_exec.__request_module
> >     294.83 ą  9%     +31.1%     386.50 ą 11%  perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
> >    1223275 ą  2%     -17.7%    1006293 ą  6%  perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
> >       2772 ą 11%     +43.6%       3980 ą 11%  perf-sched.wait_and_delay.count.do_wait.kernel_wait.call_usermodehelper_exec_work.process_one_work
> >      11690 ą  7%     +29.8%      15173 ą 10%  perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
> >       4072 ą100%    +163.7%      10737 ą  7%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
> >       8811 ą  6%     +26.2%      11117 ą  9%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
> >     662.17 ą 29%    +187.8%       1905 ą 29%  perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
> >      15.50 ą 11%     +48.4%      23.00 ą 16%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
> >     167.67 ą 20%     +48.0%     248.17 ą 26%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups
> >       2680 ą 12%     +42.5%       3820 ą 11%  perf-sched.wait_and_delay.count.schedule_timeout.___down_common.__down_timeout.down_timeout
> >     137.17 ą 13%     +31.8%     180.83 ą  9%  perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp
> >       2636 ą 12%     +43.1%       3772 ą 12%  perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.call_usermodehelper_exec
> >      10.50 ą 11%     +74.6%      18.33 ą 23%  perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
> >       5619 ą  5%     +38.8%       7797 ą  9%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> >      70455 ą  4%     +32.3%      93197 ą  6%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> >       6990 ą  4%     +37.4%       9603 ą  9%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> >     191.28 ą124%   +1455.2%       2974 ą100%  perf-sched.wait_and_delay.max.ms.__cond_resched.dput.path_openat.do_filp_open.do_sys_openat2
> >       2.24 ą 48%    +144.6%       5.48 ą 29%  perf-sched.wait_time.avg.ms.__cond_resched.__do_sys_close_range.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> >       8.06 ą101%    +618.7%      57.94 ą 68%  perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.28 ą146%    +370.9%       1.30 ą 64%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vms_gather_munmap_vmas.do_vmi_align_munmap
> >       3.86 ą  5%     +86.0%       7.18 ą 44%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.free_pgtables.exit_mmap.__mmput
> >       2.54 ą 29%     +51.6%       3.85 ą 12%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       1.11 ą 39%    +201.6%       3.36 ą 47%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
> >       3.60 ą 68%   +1630.2%      62.29 ą153%  perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
> >       0.20 ą 64%    +115.4%       0.42 ą 42%  perf-sched.wait_time.avg.ms.__cond_resched.filemap_read.__kernel_read.load_elf_binary.exec_binprm
> >      55.51 ą 53%    +218.0%     176.54 ą 83%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
> >       3.22 ą  3%     +15.4%       3.72 ą  9%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
> >       0.04 ą223%  +37562.8%      14.50 ą198%  perf-sched.wait_time.avg.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> >       1.19 ą 30%     +59.3%       1.90 ą 34%  perf-sched.wait_time.avg.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
> >       0.36 ą133%    +447.8%       1.97 ą 60%  perf-sched.wait_time.avg.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> >       1.73 ą 47%    +165.0%       4.58 ą 51%  perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> >       1.10 ą 21%    +693.9%       8.70 ą 70%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
> >      54.56 ą 32%    +400.2%     272.90 ą 67%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> >      10.87 ą 18%    +151.7%      27.36 ą 68%  perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_pte_missing.__handle_mm_fault
> >     123.35 ą181%   +1219.3%       1627 ą 82%  perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       9.68 ą108%   +5490.2%     541.19 ą185%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.do_epoll_create.__x64_sys_epoll_create.do_syscall_64
> >       3.39 ą 91%    +112.0%       7.19 ą 44%  perf-sched.wait_time.max.ms.__cond_resched.down_read_killable.iterate_dir.__x64_sys_getdents64.do_syscall_64
> >       1.12 ą128%    +407.4%       5.67 ą 52%  perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vms_gather_munmap_vmas.do_vmi_align_munmap
> >      30.58 ą 29%   +1741.1%     563.04 ą 80%  perf-sched.wait_time.max.ms.__cond_resched.down_write.free_pgtables.exit_mmap.__mmput
> >       3.82 ą114%    +232.1%      12.70 ą 49%  perf-sched.wait_time.max.ms.__cond_resched.down_write.vms_gather_munmap_vmas.__mmap_prepare.__mmap_region
> >       7.75 ą 46%     +72.4%      13.36 ą 32%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      13.39 ą 48%    +259.7%      48.17 ą 34%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
> >      12.46 ą 30%     +90.5%      23.73 ą 34%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
> >     479.90 ą 78%    +496.9%       2864 ą141%  perf-sched.wait_time.max.ms.__cond_resched.dput.open_last_lookups.path_openat.do_filp_open
> >     185.75 ą116%    +875.0%       1811 ą 84%  perf-sched.wait_time.max.ms.__cond_resched.dput.path_openat.do_filp_open.do_sys_openat2
> >       2.52 ą 44%    +105.2%       5.18 ą 36%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.alloc_bprm.kernel_execve
> >       0.04 ą223%  +37564.1%      14.50 ą198%  perf-sched.wait_time.max.ms.__cond_resched.netlink_release.__sock_release.sock_close.__fput
> >       0.54 ą153%    +669.7%       4.15 ą 61%  perf-sched.wait_time.max.ms.__cond_resched.task_numa_work.task_work_run.syscall_exit_to_user_mode.do_syscall_64
> >      28.22 ą 14%     +41.2%      39.84 ą 25%  perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
> >       4.90 ą 51%    +158.2%      12.66 ą 75%  perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.__do_fault.do_read_fault
> >      26.54 ą 66%   +5220.1%       1411 ą 84%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
> >       1262 ą  9%    +434.3%       6744 ą 60%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> >
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> >
>
>
> --
> Mateusz Guzik <mjguzik gmail.com>



-- 
Mateusz Guzik <mjguzik gmail.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ