lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202508150803.d5387224-lkp@intel.com>
Date: Fri, 15 Aug 2025 15:36:00 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Jens Axboe <axboe@...nel.dk>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>, <oliver.sang@...el.com>
Subject: [linus:master] [llist]  375700bab5:  will-it-scale.per_thread_ops
 2.6% regression



Hello,


kernel test robot noticed a 2.6% regression of will-it-scale.per_thread_ops on:


commit: 375700bab5b150e876e42d894a9a7470881f8a61 ("llist: make llist_add_batch() a static inline")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[still regression on      linus/master 8742b2d8935f476449ef37e263bc4da3295c7b58]
[still regression on linux-next/master 2674d1eadaa2fd3a918dfcdb6d0bb49efe8a8bb9]

testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 176G memory
parameters:

	nr_task: 100%
	mode: thread
	test: tlb_flush3
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202508150803.d5387224-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250815/202508150803.d5387224-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-csl-2sp10/tlb_flush3/will-it-scale

commit: 
  5ef2dccfcc ("delayacct: remove redundant code and adjust indentation")
  375700bab5 ("llist: make llist_add_batch() a static inline")

5ef2dccfcca8d864 375700bab5b150e876e42d894a9 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    118225 ±  2%      -6.0%     111161        perf-c2c.HITM.total
 1.926e+08            -2.5%  1.878e+08        proc-vmstat.pgfault
     14579            -2.2%      14264        vmstat.system.cs
    579287            -2.6%     564220        will-it-scale.192.threads
      1.98            -2.9%       1.92        will-it-scale.192.threads_idle
      3016            -2.6%       2938        will-it-scale.per_thread_ops
    579287            -2.6%     564220        will-it-scale.workload
      0.33 ± 19%     +34.2%       0.44 ±  6%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      4.79 ±  9%     -44.9%       2.64 ± 67%  perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     28.30 ±  3%      +9.9%      31.10 ±  4%  perf-sched.total_wait_and_delay.average.ms
     71544 ±  2%     -12.6%      62531 ±  3%  perf-sched.total_wait_and_delay.count.ms
     28.21 ±  3%      +9.9%      31.00 ±  4%  perf-sched.total_wait_time.average.ms
     47.56 ±115%    +220.4%     152.39 ± 11%  perf-sched.wait_and_delay.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      3197 ±  5%     -13.6%       2761 ±  5%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      4324 ± 16%     -28.8%       3079 ±  2%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.30 ± 73%     -73.6%       0.08 ±109%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
     47.48 ±115%    +220.3%     152.08 ± 11%  perf-sched.wait_time.avg.ms.anon_pipe_read.vfs_read.ksys_read.do_syscall_64
      9.36            +4.5%       9.77        perf-stat.i.MPKI
 1.427e+10            -4.5%  1.362e+10        perf-stat.i.branch-instructions
      0.97            +0.0        1.02        perf-stat.i.branch-miss-rate%
     34.20            +0.7       34.87        perf-stat.i.cache-miss-rate%
 1.753e+09            -1.5%  1.727e+09        perf-stat.i.cache-references
     14678            -2.6%      14293        perf-stat.i.context-switches
      9.07            +3.8%       9.42        perf-stat.i.cpi
    556.91 ±  2%      -4.6%     531.43        perf-stat.i.cpu-migrations
 6.398e+10            -4.0%  6.145e+10        perf-stat.i.instructions
      6.62            -2.8%       6.44        perf-stat.i.metric.K/sec
    635521            -2.7%     618322        perf-stat.i.minor-faults
    635521            -2.7%     618322        perf-stat.i.page-faults
     27.27           -27.3        0.00        perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
     26.31           -26.3        0.00        perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
     12.12           -12.1        0.00        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
     11.53           -11.5        0.00        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
     11.39           -11.4        0.00        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
     11.36           -11.4        0.00        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
     13.84            -0.3       13.54        perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
     48.02            +0.2       48.21        perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.madvise_do_behavior.do_madvise
     47.88            +0.2       48.07        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
     47.89            +0.2       48.08        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.madvise_do_behavior
      4.21            +5.9       10.09        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
      4.19            +5.9       10.08        perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
      8.00           +11.0       18.97        perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
      8.02           +11.0       19.03        perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
      8.11           +11.1       19.25        perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
     54.16           -54.2        0.00        perf-profile.children.cycles-pp.llist_add_batch
     21.03            -0.5       20.54        perf-profile.children.cycles-pp.__flush_smp_call_function_queue
     20.82            -0.5       20.37        perf-profile.children.cycles-pp.__sysvec_call_function
     21.06            -0.4       20.62        perf-profile.children.cycles-pp.sysvec_call_function
     22.05            -0.4       21.64        perf-profile.children.cycles-pp.asm_sysvec_call_function
     14.88            -0.4       14.52        perf-profile.children.cycles-pp.llist_reverse_order
      0.49 ±  3%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.common_startup_64
      0.49 ±  3%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.cpu_startup_entry
      0.49 ±  3%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.do_idle
      0.49 ±  4%      -0.1        0.41 ±  8%  perf-profile.children.cycles-pp.start_secondary
      0.42 ±  3%      -0.1        0.35 ±  8%  perf-profile.children.cycles-pp.cpuidle_idle_call
      0.40 ±  3%      -0.1        0.34 ±  7%  perf-profile.children.cycles-pp.cpuidle_enter
      0.40 ±  3%      -0.1        0.34 ±  7%  perf-profile.children.cycles-pp.cpuidle_enter_state
      0.23 ±  4%      -0.0        0.18 ±  6%  perf-profile.children.cycles-pp.intel_idle
      0.48 ±  2%      -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.21            -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.22 ±  2%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.40 ±  2%      -0.0        0.36 ±  3%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.29 ±  5%      -0.0        0.26 ±  5%  perf-profile.children.cycles-pp.madvise_lock
      0.22 ±  2%      -0.0        0.18        perf-profile.children.cycles-pp.sysvec_call_function_single
      0.52 ±  2%      -0.0        0.48 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.44 ±  3%      -0.0        0.41 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.32 ±  2%      -0.0        0.29 ±  2%  perf-profile.children.cycles-pp.update_process_times
      0.44 ±  2%      -0.0        0.41 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.12 ±  3%      -0.0        0.10 ±  8%  perf-profile.children.cycles-pp.rwsem_down_read_slowpath
      0.24            +0.0        0.26        perf-profile.children.cycles-pp.next_uptodate_folio
      0.49            +0.0        0.53 ±  2%  perf-profile.children.cycles-pp.should_flush_tlb
     48.07            +0.2       48.25        perf-profile.children.cycles-pp.unmap_page_range
     47.94            +0.2       48.12        perf-profile.children.cycles-pp.zap_pmd_range
     47.93            +0.2       48.12        perf-profile.children.cycles-pp.zap_pte_range
     41.92           -41.9        0.00        perf-profile.self.cycles-pp.llist_add_batch
     14.87            -0.4       14.51        perf-profile.self.cycles-pp.llist_reverse_order
      0.23 ±  4%      -0.0        0.18 ±  6%  perf-profile.self.cycles-pp.intel_idle
      0.18 ±  2%      +0.0        0.19        perf-profile.self.cycles-pp.next_uptodate_folio
      0.14 ±  2%      +0.0        0.16        perf-profile.self.cycles-pp.filemap_map_pages
      0.36 ±  2%      +0.0        0.40 ±  3%  perf-profile.self.cycles-pp.should_flush_tlb
     29.83           +42.5       72.37        perf-profile.self.cycles-pp.smp_call_function_many_cond




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ