[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202508150803.d5387224-lkp@intel.com>
Date: Fri, 15 Aug 2025 15:36:00 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Jens Axboe <axboe@...nel.dk>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>, <oliver.sang@...el.com>
Subject: [linus:master] [llist] 375700bab5: will-it-scale.per_thread_ops
2.6% regression
Hello,
kernel test robot noticed a 2.6% regression of will-it-scale.per_thread_ops on:
commit: 375700bab5b150e876e42d894a9a7470881f8a61 ("llist: make llist_add_batch() a static inline")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[still regression on linus/master 8742b2d8935f476449ef37e263bc4da3295c7b58]
[still regression on linux-next/master 2674d1eadaa2fd3a918dfcdb6d0bb49efe8a8bb9]
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 176G memory
parameters:
nr_task: 100%
mode: thread
test: tlb_flush3
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202508150803.d5387224-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250815/202508150803.d5387224-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-csl-2sp10/tlb_flush3/will-it-scale
commit:
5ef2dccfcc ("delayacct: remove redundant code and adjust indentation")
375700bab5 ("llist: make llist_add_batch() a static inline")
5ef2dccfcca8d864 375700bab5b150e876e42d894a9
---------------- ---------------------------
%stddev %change %stddev
\ | \
118225 ± 2% -6.0% 111161 perf-c2c.HITM.total
1.926e+08 -2.5% 1.878e+08 proc-vmstat.pgfault
14579 -2.2% 14264 vmstat.system.cs
579287 -2.6% 564220 will-it-scale.192.threads
1.98 -2.9% 1.92 will-it-scale.192.threads_idle
3016 -2.6% 2938 will-it-scale.per_thread_ops
579287 -2.6% 564220 will-it-scale.workload
0.33 ± 19% +34.2% 0.44 ± 6% perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
4.79 ± 9% -44.9% 2.64 ± 67% perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
28.30 ± 3% +9.9% 31.10 ± 4% perf-sched.total_wait_and_delay.average.ms
71544 ± 2% -12.6% 62531 ± 3% perf-sched.total_wait_and_delay.count.ms
28.21 ± 3% +9.9% 31.00 ± 4% perf-sched.total_wait_time.average.ms
47.56 ±115% +220.4% 152.39 ± 11% perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
3197 ± 5% -13.6% 2761 ± 5% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
4324 ± 16% -28.8% 3079 ± 2% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.30 ± 73% -73.6% 0.08 ±109% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
47.48 ±115% +220.3% 152.08 ± 11% perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
9.36 +4.5% 9.77 perf-stat.i.MPKI
1.427e+10 -4.5% 1.362e+10 perf-stat.i.branch-instructions
0.97 +0.0 1.02 perf-stat.i.branch-miss-rate%
34.20 +0.7 34.87 perf-stat.i.cache-miss-rate%
1.753e+09 -1.5% 1.727e+09 perf-stat.i.cache-references
14678 -2.6% 14293 perf-stat.i.context-switches
9.07 +3.8% 9.42 perf-stat.i.cpi
556.91 ± 2% -4.6% 531.43 perf-stat.i.cpu-migrations
6.398e+10 -4.0% 6.145e+10 perf-stat.i.instructions
6.62 -2.8% 6.44 perf-stat.i.metric.K/sec
635521 -2.7% 618322 perf-stat.i.minor-faults
635521 -2.7% 618322 perf-stat.i.page-faults
27.27 -27.3 0.00 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
26.31 -26.3 0.00 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
12.12 -12.1 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
11.53 -11.5 0.00 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
11.39 -11.4 0.00 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
11.36 -11.4 0.00 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
13.84 -0.3 13.54 perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
48.02 +0.2 48.21 perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.madvise_do_behavior.do_madvise
47.88 +0.2 48.07 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
47.89 +0.2 48.08 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.madvise_do_behavior
4.21 +5.9 10.09 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
4.19 +5.9 10.08 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
8.00 +11.0 18.97 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
8.02 +11.0 19.03 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
8.11 +11.1 19.25 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
54.16 -54.2 0.00 perf-profile.children.cycles-pp.llist_add_batch
21.03 -0.5 20.54 perf-profile.children.cycles-pp.__flush_smp_call_function_queue
20.82 -0.5 20.37 perf-profile.children.cycles-pp.__sysvec_call_function
21.06 -0.4 20.62 perf-profile.children.cycles-pp.sysvec_call_function
22.05 -0.4 21.64 perf-profile.children.cycles-pp.asm_sysvec_call_function
14.88 -0.4 14.52 perf-profile.children.cycles-pp.llist_reverse_order
0.49 ± 3% -0.1 0.41 ± 8% perf-profile.children.cycles-pp.common_startup_64
0.49 ± 3% -0.1 0.41 ± 8% perf-profile.children.cycles-pp.cpu_startup_entry
0.49 ± 3% -0.1 0.41 ± 8% perf-profile.children.cycles-pp.do_idle
0.49 ± 4% -0.1 0.41 ± 8% perf-profile.children.cycles-pp.start_secondary
0.42 ± 3% -0.1 0.35 ± 8% perf-profile.children.cycles-pp.cpuidle_idle_call
0.40 ± 3% -0.1 0.34 ± 7% perf-profile.children.cycles-pp.cpuidle_enter
0.40 ± 3% -0.1 0.34 ± 7% perf-profile.children.cycles-pp.cpuidle_enter_state
0.23 ± 4% -0.0 0.18 ± 6% perf-profile.children.cycles-pp.intel_idle
0.48 ± 2% -0.0 0.44 ± 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.21 -0.0 0.17 ± 2% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.22 ± 2% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.40 ± 2% -0.0 0.36 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.29 ± 5% -0.0 0.26 ± 5% perf-profile.children.cycles-pp.madvise_lock
0.22 ± 2% -0.0 0.18 perf-profile.children.cycles-pp.sysvec_call_function_single
0.52 ± 2% -0.0 0.48 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.44 ± 3% -0.0 0.41 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.32 ± 2% -0.0 0.29 ± 2% perf-profile.children.cycles-pp.update_process_times
0.44 ± 2% -0.0 0.41 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
0.12 ± 3% -0.0 0.10 ± 8% perf-profile.children.cycles-pp.rwsem_down_read_slowpath
0.24 +0.0 0.26 perf-profile.children.cycles-pp.next_uptodate_folio
0.49 +0.0 0.53 ± 2% perf-profile.children.cycles-pp.should_flush_tlb
48.07 +0.2 48.25 perf-profile.children.cycles-pp.unmap_page_range
47.94 +0.2 48.12 perf-profile.children.cycles-pp.zap_pmd_range
47.93 +0.2 48.12 perf-profile.children.cycles-pp.zap_pte_range
41.92 -41.9 0.00 perf-profile.self.cycles-pp.llist_add_batch
14.87 -0.4 14.51 perf-profile.self.cycles-pp.llist_reverse_order
0.23 ± 4% -0.0 0.18 ± 6% perf-profile.self.cycles-pp.intel_idle
0.18 ± 2% +0.0 0.19 perf-profile.self.cycles-pp.next_uptodate_folio
0.14 ± 2% +0.0 0.16 perf-profile.self.cycles-pp.filemap_map_pages
0.36 ± 2% +0.0 0.40 ± 3% perf-profile.self.cycles-pp.should_flush_tlb
29.83 +42.5 72.37 perf-profile.self.cycles-pp.smp_call_function_many_cond
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Powered by blists - more mailing lists