[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bd2059df8b6a7a6ce05dc4d325c144a64d474aae.camel@kernel.org>
Date: Sun, 26 Jan 2025 07:23:30 -0500
From: Jeff Layton <jlayton@...nel.org>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
Christian Brauner
<brauner@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, John Stultz
<jstultz@...gle.com>
Subject: Re: [linus:master] [timekeeping] ee3283c608:
will-it-scale.per_process_ops 4.8% regression
On Sun, 2025-01-26 at 16:25 +0800, kernel test robot wrote:
> hi, Jeff Layton,
>
>
> we make out below report just FYI since the results is stable in our tests.
> we don't have enough knowledge if this regression is due to align.
>
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
>
> if low value, please just ignore. thanks a lot.
>
I think this is more or less the same regression we measured with the
pipe1 test during the rc phase:
https://lore.kernel.org/linux-fsdevel/202410091041.6f5d221e-oliver.sang@intel.com/
This test just testing how fast it can do writes into a file in /tmp
without doing anything else in between. I don't think there is much we
can do to mitigate the perf hit here, as there is a basic cost to
fetching and handling the floor and ctime consistently.
>
> Hello,
>
> kernel test robot noticed a 4.8% regression of will-it-scale.per_process_ops on:
>
>
> commit: ee3283c608dfa21251b0821d7bb198c7ae3189f6 ("timekeeping: Add interfaces for handling timestamps with a floor value")
That patch just adds two new interfaces, but the first caller of them
wasn't added until a later patch. Are you sure that bisect landed in
the right place?
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master bc8198dc7ebc492ec3e9fa1617dcdfbe98e73b17]
> [test failed on linux-next/master 5ffa57f6eecefababb8cbe327222ef171943b183]
>
> testcase: will-it-scale
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 104 threads 2 sockets (Skylake) with 192G memory
> parameters:
>
> nr_task: 100%
> mode: process
> test: pwrite1
> cpufreq_governor: performance
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> > Closes: https://lore.kernel.org/oe-lkp/202501261527.c3bf4764-lkp@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250126/202501261527.c3bf4764-lkp@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-9.4/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/pwrite1/will-it-scale
>
> commit:
> v6.12-rc2
> ee3283c608 ("timekeeping: Add interfaces for handling timestamps with a floor value")
>
> v6.12-rc2 ee3283c608dfa21251b0821d7bb
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 57550068 -4.8% 54794800 will-it-scale.104.processes
> 553365 -4.8% 526872 will-it-scale.per_process_ops
> 57550068 -4.8% 54794800 will-it-scale.workload
> 43.00 ± 27% -60.0% 17.20 ± 27% perf-c2c.DRAM.local
> 251.20 ± 23% -57.5% 106.80 ± 16% perf-c2c.DRAM.remote
> 520.00 ± 33% -70.3% 154.20 ± 13% perf-c2c.HITM.local
> 218.50 ± 25% -55.2% 97.80 ± 18% perf-c2c.HITM.remote
> 0.03 ± 14% +48.4% 0.04 ± 9% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 4.18 ± 4% +21.5% 5.08 perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 653.70 ± 5% +50.5% 983.70 ± 7% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
> 913.40 ± 6% -24.8% 686.80 ± 7% perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
> 1.29 ± 81% +42618.3% 552.09 ± 74% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> 2.58 ± 81% +65403.1% 1692 ± 72% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> 1.721e+10 -4.8% 1.639e+10 perf-stat.i.branch-instructions
> 1.66 +0.1 1.72 perf-stat.i.branch-miss-rate%
> 2.852e+08 -1.2% 2.818e+08 perf-stat.i.branch-misses
> 3.29 +4.9% 3.45 perf-stat.i.cpi
> 8.743e+10 -4.8% 8.327e+10 perf-stat.i.instructions
> 0.30 -4.7% 0.29 perf-stat.i.ipc
> 1.66 +0.1 1.72 perf-stat.overall.branch-miss-rate%
> 3.29 +4.9% 3.45 perf-stat.overall.cpi
> 0.30 -4.7% 0.29 perf-stat.overall.ipc
> 1.715e+10 -4.8% 1.634e+10 perf-stat.ps.branch-instructions
> 2.842e+08 -1.2% 2.809e+08 perf-stat.ps.branch-misses
> 8.714e+10 -4.8% 8.3e+10 perf-stat.ps.instructions
> 2.632e+13 -4.7% 2.508e+13 perf-stat.total.instructions
> 10.62 -4.8 5.81 perf-profile.calltrace.cycles-pp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
> 8.89 ± 2% -4.6 4.25 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write
> 5.98 ± 3% -4.2 1.79 ± 2% perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
> 13.24 -1.4 11.88 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.__libc_pwrite
> 16.62 -1.2 15.42 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_pwrite
> 2.90 -1.2 1.74 perf-profile.calltrace.cycles-pp.file_update_time.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
> 2.38 ± 2% -0.9 1.44 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
> 1.68 ± 2% -0.9 0.79 perf-profile.calltrace.cycles-pp.folio_unlock.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
> 1.42 ± 13% -0.8 0.64 ± 3% perf-profile.calltrace.cycles-pp.file_remove_privs_flags.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
> 5.69 -0.7 4.99 ± 2% perf-profile.calltrace.cycles-pp.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
> 6.91 -0.4 6.53 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__libc_pwrite
> 1.23 ± 2% -0.2 1.01 perf-profile.calltrace.cycles-pp.fdget.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
> 1.41 -0.2 1.26 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
> 0.87 -0.1 0.79 ± 2% perf-profile.calltrace.cycles-pp.up_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
> 0.79 ± 2% -0.1 0.74 perf-profile.calltrace.cycles-pp.noop_dirty_folio.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
> 1.15 ± 2% +0.1 1.26 ± 2% perf-profile.calltrace.cycles-pp.down_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
> 0.54 +0.2 0.73 perf-profile.calltrace.cycles-pp.folio_mark_accessed.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
> 0.82 ± 2% +0.4 1.26 ± 5% perf-profile.calltrace.cycles-pp.folio_mark_dirty.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write
> 0.00 +0.7 0.67 perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
> 2.10 +1.2 3.35 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.shmem_file_write_iter.vfs_write
> 2.36 +1.3 3.69 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
> 46.08 +2.8 48.91 perf-profile.calltrace.cycles-pp.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
> 43.76 +3.3 47.02 perf-profile.calltrace.cycles-pp.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
> 58.89 +3.4 62.32 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_pwrite
> 38.55 +3.5 42.07 perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 49.37 +3.7 53.09 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite
> 29.41 +5.6 34.99 perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64.do_syscall_64
> 4.60 +7.7 12.30 perf-profile.calltrace.cycles-pp.rep_movs_alternative.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write
> 6.68 +10.3 16.96 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write.__x64_sys_pwrite64
> 10.69 -4.8 5.86 perf-profile.children.cycles-pp.shmem_write_begin
> 8.99 ± 2% -4.6 4.35 perf-profile.children.cycles-pp.shmem_get_folio_gfp
> 6.00 ± 3% -4.2 1.81 ± 2% perf-profile.children.cycles-pp.filemap_get_entry
> 14.20 -1.4 12.77 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> 1.62 ± 9% -1.3 0.37 ± 5% perf-profile.children.cycles-pp.xas_load
> 16.76 -1.2 15.54 perf-profile.children.cycles-pp.syscall_return_via_sysret
> 2.96 -1.2 1.79 perf-profile.children.cycles-pp.file_update_time
> 2.47 ± 2% -1.0 1.51 perf-profile.children.cycles-pp.inode_needs_update_time
> 1.69 ± 2% -0.9 0.79 perf-profile.children.cycles-pp.folio_unlock
> 1.44 ± 13% -0.8 0.65 ± 3% perf-profile.children.cycles-pp.file_remove_privs_flags
> 5.94 -0.7 5.24 ± 2% perf-profile.children.cycles-pp.shmem_write_end
> 7.17 -0.5 6.67 perf-profile.children.cycles-pp.entry_SYSCALL_64
> 1.77 -0.4 1.42 perf-profile.children.cycles-pp.__cond_resched
> 0.67 ± 3% -0.3 0.41 perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> 1.68 ± 9% -0.2 1.42 ± 4% perf-profile.children.cycles-pp.generic_write_checks
> 1.25 -0.2 1.03 perf-profile.children.cycles-pp.fdget
> 1.44 -0.2 1.28 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 0.38 ± 3% -0.1 0.27 ± 2% perf-profile.children.cycles-pp.timestamp_truncate
> 0.37 ± 4% -0.1 0.26 perf-profile.children.cycles-pp.rw_verify_area
> 0.69 ± 3% -0.1 0.60 perf-profile.children.cycles-pp.rcu_all_qs
> 0.90 -0.1 0.82 ± 2% perf-profile.children.cycles-pp.up_write
> 0.23 ± 5% -0.1 0.16 ± 2% perf-profile.children.cycles-pp.xas_start
> 0.85 -0.1 0.80 perf-profile.children.cycles-pp.noop_dirty_folio
> 0.23 ± 4% -0.0 0.20 ± 3% perf-profile.children.cycles-pp.x64_sys_call
> 0.15 ± 5% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.security_file_permission
> 0.28 ± 2% -0.0 0.26 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> 0.17 ± 5% +0.0 0.19 ± 3% perf-profile.children.cycles-pp.sched_tick
> 1.18 +0.1 1.28 ± 2% perf-profile.children.cycles-pp.down_write
> 0.35 ± 3% +0.1 0.48 ± 6% perf-profile.children.cycles-pp.folio_mapping
> 0.50 ± 2% +0.2 0.69 perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags
> 0.55 ± 2% +0.2 0.75 perf-profile.children.cycles-pp.folio_mark_accessed
> 1.75 ± 2% +0.4 2.10 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.90 +0.5 1.36 ± 5% perf-profile.children.cycles-pp.folio_mark_dirty
> 2.17 +1.2 3.41 perf-profile.children.cycles-pp.fault_in_readable
> 2.40 +1.4 3.75 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
> 46.10 +2.8 48.93 perf-profile.children.cycles-pp.__x64_sys_pwrite64
> 43.86 +3.2 47.10 perf-profile.children.cycles-pp.vfs_write
> 39.00 +3.4 42.41 perf-profile.children.cycles-pp.shmem_file_write_iter
> 59.15 +3.4 62.56 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 49.50 +3.7 53.21 perf-profile.children.cycles-pp.do_syscall_64
> 29.56 +5.6 35.14 perf-profile.children.cycles-pp.generic_perform_write
> 4.74 +8.3 13.02 perf-profile.children.cycles-pp.rep_movs_alternative
> 6.85 +9.6 16.44 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
> 4.34 ± 2% -2.9 1.43 ± 2% perf-profile.self.cycles-pp.filemap_get_entry
> 14.06 -1.4 12.65 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 16.74 -1.2 15.53 perf-profile.self.cycles-pp.syscall_return_via_sysret
> 1.39 ± 10% -1.2 0.21 ± 8% perf-profile.self.cycles-pp.xas_load
> 1.49 ± 3% -0.9 0.58 perf-profile.self.cycles-pp.folio_unlock
> 2.72 ± 2% -0.9 1.83 perf-profile.self.cycles-pp.__libc_pwrite
> 1.42 ± 13% -0.8 0.61 ± 3% perf-profile.self.cycles-pp.file_remove_privs_flags
> 1.42 -0.6 0.83 perf-profile.self.cycles-pp.inode_needs_update_time
> 1.92 ± 5% -0.5 1.44 perf-profile.self.cycles-pp.shmem_get_folio_gfp
> 6.24 -0.4 5.81 perf-profile.self.cycles-pp.entry_SYSCALL_64
> 9.82 -0.3 9.50 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.64 ± 3% -0.3 0.38 perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> 1.06 ± 2% -0.3 0.79 perf-profile.self.cycles-pp.__cond_resched
> 1.74 ± 5% -0.2 1.52 ± 2% perf-profile.self.cycles-pp.shmem_write_begin
> 1.24 ± 2% -0.2 1.03 perf-profile.self.cycles-pp.fdget
> 0.45 ± 3% -0.2 0.25 perf-profile.self.cycles-pp.file_update_time
> 0.98 ± 2% -0.2 0.79 ± 2% perf-profile.self.cycles-pp.__x64_sys_pwrite64
> 2.73 ± 2% -0.2 2.54 ± 2% perf-profile.self.cycles-pp.shmem_write_end
> 0.72 ± 5% -0.1 0.58 ± 4% perf-profile.self.cycles-pp.generic_write_checks
> 1.14 -0.1 1.02 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> 0.36 ± 3% -0.1 0.25 ± 2% perf-profile.self.cycles-pp.timestamp_truncate
> 0.23 ± 4% -0.1 0.15 ± 2% perf-profile.self.cycles-pp.rw_verify_area
> 0.60 ± 3% -0.1 0.53 perf-profile.self.cycles-pp.rcu_all_qs
> 0.81 -0.1 0.74 perf-profile.self.cycles-pp.noop_dirty_folio
> 0.20 ± 4% -0.1 0.14 ± 2% perf-profile.self.cycles-pp.xas_start
> 0.81 -0.1 0.75 ± 2% perf-profile.self.cycles-pp.up_write
> 0.21 ± 3% -0.0 0.18 ± 3% perf-profile.self.cycles-pp.x64_sys_call
> 0.26 ± 2% -0.0 0.23 ± 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
> 0.12 ± 6% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.security_file_permission
> 0.21 ± 4% +0.0 0.24 perf-profile.self.cycles-pp.testcase
> 0.77 ± 2% +0.0 0.82 ± 3% perf-profile.self.cycles-pp.down_write
> 0.24 ± 3% +0.1 0.36 perf-profile.self.cycles-pp.fault_in_iov_iter_readable
> 0.30 ± 3% +0.1 0.43 ± 6% perf-profile.self.cycles-pp.folio_mapping
> 0.35 ± 2% +0.2 0.54 perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags
> 2.74 +0.2 2.93 ± 2% perf-profile.self.cycles-pp.generic_perform_write
> 0.52 +0.2 0.72 perf-profile.self.cycles-pp.folio_mark_accessed
> 0.55 ± 2% +0.3 0.87 ± 5% perf-profile.self.cycles-pp.folio_mark_dirty
> 0.56 +0.5 1.10 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> 1.48 ± 2% +1.1 2.55 ± 4% perf-profile.self.cycles-pp.do_syscall_64
> 2.14 +1.2 3.35 perf-profile.self.cycles-pp.fault_in_readable
> 2.20 +1.3 3.51 ± 2% perf-profile.self.cycles-pp.copy_page_from_iter_atomic
> 4.59 +8.2 12.80 perf-profile.self.cycles-pp.rep_movs_alternative
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists