lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bd2059df8b6a7a6ce05dc4d325c144a64d474aae.camel@kernel.org>
Date: Sun, 26 Jan 2025 07:23:30 -0500
From: Jeff Layton <jlayton@...nel.org>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org, 
 Christian Brauner
	 <brauner@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, John Stultz
	 <jstultz@...gle.com>
Subject: Re: [linus:master] [timekeeping]  ee3283c608:
 will-it-scale.per_process_ops 4.8% regression

On Sun, 2025-01-26 at 16:25 +0800, kernel test robot wrote:
> hi, Jeff Layton,
> 
> 
> we make out below report just FYI since the results is stable in our tests.
> we don't have enough knowledge if this regression is due to align.
> 
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> 
> if low value, please just ignore. thanks a lot.
> 


I think this is more or less the same regression we measured with the
pipe1 test during the rc phase:

    https://lore.kernel.org/linux-fsdevel/202410091041.6f5d221e-oliver.sang@intel.com/

This test just testing how fast it can do writes into a file in /tmp
without doing anything else in between. I don't think there is much we
can do to mitigate the perf hit here, as there is a basic cost to
fetching and handling the floor and ctime consistently.

> 
> Hello,
> 
> kernel test robot noticed a 4.8% regression of will-it-scale.per_process_ops on:
> 
> 
> commit: ee3283c608dfa21251b0821d7bb198c7ae3189f6 ("timekeeping: Add interfaces for handling timestamps with a floor value")

That patch just adds two new interfaces, but the first caller of them
wasn't added until a later patch. Are you sure that bisect landed in
the right place?

> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> [test failed on linus/master      bc8198dc7ebc492ec3e9fa1617dcdfbe98e73b17]
> [test failed on linux-next/master 5ffa57f6eecefababb8cbe327222ef171943b183]
> 
> testcase: will-it-scale
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 104 threads 2 sockets (Skylake) with 192G memory
> parameters:
> 
> 	nr_task: 100%
> 	mode: process
> 	test: pwrite1
> 	cpufreq_governor: performance
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> > Closes: https://lore.kernel.org/oe-lkp/202501261527.c3bf4764-lkp@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250126/202501261527.c3bf4764-lkp@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/pwrite1/will-it-scale
> 
> commit: 
>   v6.12-rc2
>   ee3283c608 ("timekeeping: Add interfaces for handling timestamps with a floor value")
> 
>        v6.12-rc2 ee3283c608dfa21251b0821d7bb 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>   57550068            -4.8%   54794800        will-it-scale.104.processes
>     553365            -4.8%     526872        will-it-scale.per_process_ops
>   57550068            -4.8%   54794800        will-it-scale.workload
>      43.00 ± 27%     -60.0%      17.20 ± 27%  perf-c2c.DRAM.local
>     251.20 ± 23%     -57.5%     106.80 ± 16%  perf-c2c.DRAM.remote
>     520.00 ± 33%     -70.3%     154.20 ± 13%  perf-c2c.HITM.local
>     218.50 ± 25%     -55.2%      97.80 ± 18%  perf-c2c.HITM.remote
>       0.03 ± 14%     +48.4%       0.04 ±  9%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       4.18 ±  4%     +21.5%       5.08        perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>     653.70 ±  5%     +50.5%     983.70 ±  7%  perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
>     913.40 ±  6%     -24.8%     686.80 ±  7%  perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
>       1.29 ± 81%  +42618.3%     552.09 ± 74%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>       2.58 ± 81%  +65403.1%       1692 ± 72%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>  1.721e+10            -4.8%  1.639e+10        perf-stat.i.branch-instructions
>       1.66            +0.1        1.72        perf-stat.i.branch-miss-rate%
>  2.852e+08            -1.2%  2.818e+08        perf-stat.i.branch-misses
>       3.29            +4.9%       3.45        perf-stat.i.cpi
>  8.743e+10            -4.8%  8.327e+10        perf-stat.i.instructions
>       0.30            -4.7%       0.29        perf-stat.i.ipc
>       1.66            +0.1        1.72        perf-stat.overall.branch-miss-rate%
>       3.29            +4.9%       3.45        perf-stat.overall.cpi
>       0.30            -4.7%       0.29        perf-stat.overall.ipc
>  1.715e+10            -4.8%  1.634e+10        perf-stat.ps.branch-instructions
>  2.842e+08            -1.2%  2.809e+08        perf-stat.ps.branch-misses
>  8.714e+10            -4.8%    8.3e+10        perf-stat.ps.instructions
>  2.632e+13            -4.7%  2.508e+13        perf-stat.total.instructions
>      10.62            -4.8        5.81        perf-profile.calltrace.cycles-pp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
>       8.89 ±  2%      -4.6        4.25        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write
>       5.98 ±  3%      -4.2        1.79 ±  2%  perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
>      13.24            -1.4       11.88        perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__libc_pwrite
>      16.62            -1.2       15.42        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_pwrite
>       2.90            -1.2        1.74        perf-profile.calltrace.cycles-pp.file_update_time.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
>       2.38 ±  2%      -0.9        1.44        perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
>       1.68 ±  2%      -0.9        0.79        perf-profile.calltrace.cycles-pp.folio_unlock.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
>       1.42 ± 13%      -0.8        0.64 ±  3%  perf-profile.calltrace.cycles-pp.file_remove_privs_flags.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
>       5.69            -0.7        4.99 ±  2%  perf-profile.calltrace.cycles-pp.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
>       6.91            -0.4        6.53        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__libc_pwrite
>       1.23 ±  2%      -0.2        1.01        perf-profile.calltrace.cycles-pp.fdget.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
>       1.41            -0.2        1.26        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
>       0.87            -0.1        0.79 ±  2%  perf-profile.calltrace.cycles-pp.up_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
>       0.79 ±  2%      -0.1        0.74        perf-profile.calltrace.cycles-pp.noop_dirty_folio.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
>       1.15 ±  2%      +0.1        1.26 ±  2%  perf-profile.calltrace.cycles-pp.down_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
>       0.54            +0.2        0.73        perf-profile.calltrace.cycles-pp.folio_mark_accessed.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
>       0.82 ±  2%      +0.4        1.26 ±  5%  perf-profile.calltrace.cycles-pp.folio_mark_dirty.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
>       0.00            +0.7        0.67        perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
>       2.10            +1.2        3.35        perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.shmem_file_write_iter.vfs_write
>       2.36            +1.3        3.69        perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
>      46.08            +2.8       48.91        perf-profile.calltrace.cycles-pp.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
>      43.76            +3.3       47.02        perf-profile.calltrace.cycles-pp.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
>      58.89            +3.4       62.32        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pwrite
>      38.55            +3.5       42.07        perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      49.37            +3.7       53.09        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
>      29.41            +5.6       34.99        perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
>       4.60            +7.7       12.30        perf-profile.calltrace.cycles-pp.rep_movs_alternative.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write
>       6.68           +10.3       16.96        perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
>      10.69            -4.8        5.86        perf-profile.children.cycles-pp.shmem_write_begin
>       8.99 ±  2%      -4.6        4.35        perf-profile.children.cycles-pp.shmem_get_folio_gfp
>       6.00 ±  3%      -4.2        1.81 ±  2%  perf-profile.children.cycles-pp.filemap_get_entry
>      14.20            -1.4       12.77        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       1.62 ±  9%      -1.3        0.37 ±  5%  perf-profile.children.cycles-pp.xas_load
>      16.76            -1.2       15.54        perf-profile.children.cycles-pp.syscall_return_via_sysret
>       2.96            -1.2        1.79        perf-profile.children.cycles-pp.file_update_time
>       2.47 ±  2%      -1.0        1.51        perf-profile.children.cycles-pp.inode_needs_update_time
>       1.69 ±  2%      -0.9        0.79        perf-profile.children.cycles-pp.folio_unlock
>       1.44 ± 13%      -0.8        0.65 ±  3%  perf-profile.children.cycles-pp.file_remove_privs_flags
>       5.94            -0.7        5.24 ±  2%  perf-profile.children.cycles-pp.shmem_write_end
>       7.17            -0.5        6.67        perf-profile.children.cycles-pp.entry_SYSCALL_64
>       1.77            -0.4        1.42        perf-profile.children.cycles-pp.__cond_resched
>       0.67 ±  3%      -0.3        0.41        perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
>       1.68 ±  9%      -0.2        1.42 ±  4%  perf-profile.children.cycles-pp.generic_write_checks
>       1.25            -0.2        1.03        perf-profile.children.cycles-pp.fdget
>       1.44            -0.2        1.28        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       0.38 ±  3%      -0.1        0.27 ±  2%  perf-profile.children.cycles-pp.timestamp_truncate
>       0.37 ±  4%      -0.1        0.26        perf-profile.children.cycles-pp.rw_verify_area
>       0.69 ±  3%      -0.1        0.60        perf-profile.children.cycles-pp.rcu_all_qs
>       0.90            -0.1        0.82 ±  2%  perf-profile.children.cycles-pp.up_write
>       0.23 ±  5%      -0.1        0.16 ±  2%  perf-profile.children.cycles-pp.xas_start
>       0.85            -0.1        0.80        perf-profile.children.cycles-pp.noop_dirty_folio
>       0.23 ±  4%      -0.0        0.20 ±  3%  perf-profile.children.cycles-pp.x64_sys_call
>       0.15 ±  5%      -0.0        0.11 ±  4%  perf-profile.children.cycles-pp.security_file_permission
>       0.28 ±  2%      -0.0        0.26        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.17 ±  5%      +0.0        0.19 ±  3%  perf-profile.children.cycles-pp.sched_tick
>       1.18            +0.1        1.28 ±  2%  perf-profile.children.cycles-pp.down_write
>       0.35 ±  3%      +0.1        0.48 ±  6%  perf-profile.children.cycles-pp.folio_mapping
>       0.50 ±  2%      +0.2        0.69        perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
>       0.55 ±  2%      +0.2        0.75        perf-profile.children.cycles-pp.folio_mark_accessed
>       1.75 ±  2%      +0.4        2.10 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
>       0.90            +0.5        1.36 ±  5%  perf-profile.children.cycles-pp.folio_mark_dirty
>       2.17            +1.2        3.41        perf-profile.children.cycles-pp.fault_in_readable
>       2.40            +1.4        3.75        perf-profile.children.cycles-pp.fault_in_iov_iter_readable
>      46.10            +2.8       48.93        perf-profile.children.cycles-pp.__x64_sys_pwrite64
>      43.86            +3.2       47.10        perf-profile.children.cycles-pp.vfs_write
>      39.00            +3.4       42.41        perf-profile.children.cycles-pp.shmem_file_write_iter
>      59.15            +3.4       62.56        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      49.50            +3.7       53.21        perf-profile.children.cycles-pp.do_syscall_64
>      29.56            +5.6       35.14        perf-profile.children.cycles-pp.generic_perform_write
>       4.74            +8.3       13.02        perf-profile.children.cycles-pp.rep_movs_alternative
>       6.85            +9.6       16.44        perf-profile.children.cycles-pp.copy_page_from_iter_atomic
>       4.34 ±  2%      -2.9        1.43 ±  2%  perf-profile.self.cycles-pp.filemap_get_entry
>      14.06            -1.4       12.65        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>      16.74            -1.2       15.53        perf-profile.self.cycles-pp.syscall_return_via_sysret
>       1.39 ± 10%      -1.2        0.21 ±  8%  perf-profile.self.cycles-pp.xas_load
>       1.49 ±  3%      -0.9        0.58        perf-profile.self.cycles-pp.folio_unlock
>       2.72 ±  2%      -0.9        1.83        perf-profile.self.cycles-pp.__libc_pwrite
>       1.42 ± 13%      -0.8        0.61 ±  3%  perf-profile.self.cycles-pp.file_remove_privs_flags
>       1.42            -0.6        0.83        perf-profile.self.cycles-pp.inode_needs_update_time
>       1.92 ±  5%      -0.5        1.44        perf-profile.self.cycles-pp.shmem_get_folio_gfp
>       6.24            -0.4        5.81        perf-profile.self.cycles-pp.entry_SYSCALL_64
>       9.82            -0.3        9.50        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       0.64 ±  3%      -0.3        0.38        perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
>       1.06 ±  2%      -0.3        0.79        perf-profile.self.cycles-pp.__cond_resched
>       1.74 ±  5%      -0.2        1.52 ±  2%  perf-profile.self.cycles-pp.shmem_write_begin
>       1.24 ±  2%      -0.2        1.03        perf-profile.self.cycles-pp.fdget
>       0.45 ±  3%      -0.2        0.25        perf-profile.self.cycles-pp.file_update_time
>       0.98 ±  2%      -0.2        0.79 ±  2%  perf-profile.self.cycles-pp.__x64_sys_pwrite64
>       2.73 ±  2%      -0.2        2.54 ±  2%  perf-profile.self.cycles-pp.shmem_write_end
>       0.72 ±  5%      -0.1        0.58 ±  4%  perf-profile.self.cycles-pp.generic_write_checks
>       1.14            -0.1        1.02        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
>       0.36 ±  3%      -0.1        0.25 ±  2%  perf-profile.self.cycles-pp.timestamp_truncate
>       0.23 ±  4%      -0.1        0.15 ±  2%  perf-profile.self.cycles-pp.rw_verify_area
>       0.60 ±  3%      -0.1        0.53        perf-profile.self.cycles-pp.rcu_all_qs
>       0.81            -0.1        0.74        perf-profile.self.cycles-pp.noop_dirty_folio
>       0.20 ±  4%      -0.1        0.14 ±  2%  perf-profile.self.cycles-pp.xas_start
>       0.81            -0.1        0.75 ±  2%  perf-profile.self.cycles-pp.up_write
>       0.21 ±  3%      -0.0        0.18 ±  3%  perf-profile.self.cycles-pp.x64_sys_call
>       0.26 ±  2%      -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.12 ±  6%      -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.security_file_permission
>       0.21 ±  4%      +0.0        0.24        perf-profile.self.cycles-pp.testcase
>       0.77 ±  2%      +0.0        0.82 ±  3%  perf-profile.self.cycles-pp.down_write
>       0.24 ±  3%      +0.1        0.36        perf-profile.self.cycles-pp.fault_in_iov_iter_readable
>       0.30 ±  3%      +0.1        0.43 ±  6%  perf-profile.self.cycles-pp.folio_mapping
>       0.35 ±  2%      +0.2        0.54        perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
>       2.74            +0.2        2.93 ±  2%  perf-profile.self.cycles-pp.generic_perform_write
>       0.52            +0.2        0.72        perf-profile.self.cycles-pp.folio_mark_accessed
>       0.55 ±  2%      +0.3        0.87 ±  5%  perf-profile.self.cycles-pp.folio_mark_dirty
>       0.56            +0.5        1.10 ±  4%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
>       1.48 ±  2%      +1.1        2.55 ±  4%  perf-profile.self.cycles-pp.do_syscall_64
>       2.14            +1.2        3.35        perf-profile.self.cycles-pp.fault_in_readable
>       2.20            +1.3        3.51 ±  2%  perf-profile.self.cycles-pp.copy_page_from_iter_atomic
>       4.59            +8.2       12.80        perf-profile.self.cycles-pp.rep_movs_alternative
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 

-- 
Jeff Layton <jlayton@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ