lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202601071255.645ef92e-lkp@intel.com>
Date: Wed, 7 Jan 2026 13:32:07 +0800
From: kernel test robot <oliver.sang@...el.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, Andrii Nakryiko
	<andrii@...nel.org>, Alexei Starovoitov <ast@...nel.org>, Peter Zijlstra
	<peterz@...radead.org>, <linux-doc@...r.kernel.org>, <rcu@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <oliver.sang@...el.com>
Subject: [paulmckrcu:dev.2025.12.16a] [rcu]  b41f5a411f: unixbench.throughput
 2.0% improvement


hi, Paul E. McKenney,

we don't have enough knowledge to understand the performance impact of this
commit. since the data is stable, we still report out FYI what we see in our
tests. please educate us if this report is less meaningful. thanks


Hello,

kernel test robot noticed a 2.0% improvement of unixbench.throughput on:


commit: b41f5a411fb5f8c76c1d945ab391873414d01647 ("rcu: Clean up after the SRCU-fastification of RCU Tasks Trace")
https://github.com/paulmckrcu/linux dev.2025.12.16a

testcase: unixbench
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	runtime: 300s
	nr_task: 100%
	test: double
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | unixbench: unixbench.throughput 2.1% improvement                                          |
| test machine     | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory |
| test parameters  | cpufreq_governor=performance                                                              |
|                  | nr_task=100%                                                                              |
|                  | runtime=300s                                                                              |
|                  | test=long                                                                                 |
+------------------+-------------------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260107/202601071255.645ef92e-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/300s/lkp-icl-2sp9/double/unixbench

commit: 
  14c7fd5dbf ("context_tracking: Remove rcu_task_trace_heavyweight_{enter,exit}()")
  b41f5a411f ("rcu: Clean up after the SRCU-fastification of RCU Tasks Trace")

14c7fd5dbfa07e79 b41f5a411fb5f8c76c1d945ab39 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  13742762 ± 15%     +31.8%   18115584 ± 15%  meminfo.DirectMap2M
     43251            -2.0%      42383        proc-vmstat.nr_slab_unreclaimable
 2.114e+09            +2.0%  2.156e+09        unixbench.throughput
 2.748e+11            +2.0%  2.803e+11        unixbench.workload
     13.18 ± 51%      -7.8        5.39 ±142%  perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit
     13.12 ± 51%      -7.7        5.39 ±142%  perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail.commit_tail
     13.12 ± 51%      -7.7        5.39 ±142%  perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail.ast_mode_config_helper_atomic_commit_tail
     12.91 ± 52%      -7.6        5.29 ±141%  perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail
      1.80 ± 26%      -1.3        0.48 ±110%  perf-profile.calltrace.cycles-pp.setlocale
     13.18 ± 51%      -7.8        5.39 ±142%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
      1.80 ± 26%      -1.1        0.69 ± 52%  perf-profile.children.cycles-pp.setlocale
      0.45 ± 86%      +0.8        1.29 ± 37%  perf-profile.children.cycles-pp.folio_remove_rmap_ptes
      0.34 ±121%      +0.9        1.22 ± 32%  perf-profile.self.cycles-pp.folio_remove_rmap_ptes
 8.198e+10            +2.0%  8.361e+10        perf-stat.i.branch-instructions
      0.89            -0.1        0.75 ±  3%  perf-stat.i.branch-miss-rate%
 3.715e+08           -45.1%   2.04e+08 ±  2%  perf-stat.i.branch-misses
      1.09            -1.8%       1.07        perf-stat.i.cpi
 1.713e+11            +2.0%  1.746e+11        perf-stat.i.instructions
      0.97            +1.6%       0.98        perf-stat.i.ipc
      0.45            -0.2        0.24 ±  2%  perf-stat.overall.branch-miss-rate%
      1.02            -1.9%       1.00        perf-stat.overall.cpi
      0.98            +2.0%       1.00        perf-stat.overall.ipc
 8.136e+10            +2.0%  8.299e+10        perf-stat.ps.branch-instructions
 3.687e+08           -45.1%  2.024e+08 ±  2%  perf-stat.ps.branch-misses
   1.7e+11            +2.0%  1.733e+11        perf-stat.ps.instructions
 2.241e+13            +1.9%  2.283e+13        perf-stat.total.instructions


***************************************************************************************************
lkp-icl-2sp9: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
  gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/300s/lkp-icl-2sp9/long/unixbench

commit: 
  14c7fd5dbf ("context_tracking: Remove rcu_task_trace_heavyweight_{enter,exit}()")
  b41f5a411f ("rcu: Clean up after the SRCU-fastification of RCU Tasks Trace")

14c7fd5dbfa07e79 b41f5a411fb5f8c76c1d945ab39 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     43238            -1.9%      42435        proc-vmstat.nr_slab_unreclaimable
 2.113e+09            +2.1%  2.156e+09        unixbench.throughput
 2.746e+11            +2.1%  2.803e+11        unixbench.workload
 8.191e+10            +2.1%   8.36e+10        perf-stat.i.branch-instructions
      0.91 ±  2%      -0.2        0.72        perf-stat.i.branch-miss-rate%
 3.799e+08           -46.3%  2.041e+08 ±  2%  perf-stat.i.branch-misses
    644407 ±  2%      -3.0%     624964        perf-stat.i.cycles-between-cache-misses
 1.711e+11            +2.1%  1.746e+11        perf-stat.i.instructions
      0.97            +1.7%       0.98        perf-stat.i.ipc
      0.46            -0.2        0.24 ±  2%  perf-stat.overall.branch-miss-rate%
      1.03            -2.0%       1.00        perf-stat.overall.cpi
      0.98            +2.1%       1.00        perf-stat.overall.ipc
  8.13e+10            +2.1%  8.297e+10        perf-stat.ps.branch-instructions
 3.771e+08           -46.3%  2.025e+08 ±  2%  perf-stat.ps.branch-misses
 1.698e+11            +2.1%  1.733e+11        perf-stat.ps.instructions
  2.24e+13            +2.0%  2.286e+13        perf-stat.total.instructions
     16.25 ± 81%      -8.4        7.84 ±141%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.25 ± 81%      -8.4        7.84 ±141%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.18 ± 80%      -8.3        7.84 ±141%  perf-profile.calltrace.cycles-pp.console_flush_one_record.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.cold
     16.18 ± 80%      -8.3        7.84 ±141%  perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.cold.vfs_write
     16.18 ± 80%      -8.3        7.84 ±141%  perf-profile.calltrace.cycles-pp.devkmsg_emit.devkmsg_write.cold.vfs_write.ksys_write.do_syscall_64
     16.18 ± 80%      -8.3        7.84 ±141%  perf-profile.calltrace.cycles-pp.devkmsg_write.cold.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.18 ± 80%      -8.3        7.84 ±141%  perf-profile.calltrace.cycles-pp.vprintk_emit.devkmsg_emit.devkmsg_write.cold.vfs_write.ksys_write
     15.38 ± 80%      -7.7        7.63 ±141%  perf-profile.calltrace.cycles-pp.serial8250_console_write.console_flush_one_record.console_unlock.vprintk_emit.devkmsg_emit
     13.33 ± 82%      -6.4        6.90 ±141%  perf-profile.calltrace.cycles-pp.wait_for_lsr.serial8250_console_write.console_flush_one_record.console_unlock.vprintk_emit
     10.50 ± 77%      -4.9        5.65 ±141%  perf-profile.calltrace.cycles-pp.io_serial_in.wait_for_lsr.serial8250_console_write.console_flush_one_record.console_unlock
      1.58 ± 22%      +0.9        2.46 ± 20%  perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.cmd_record.run_builtin.handle_internal_command
      1.58 ± 22%      +0.9        2.46 ± 20%  perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.cmd_record.run_builtin.handle_internal_command.main
      1.52 ± 25%      +0.9        2.46 ± 20%  perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.cmd_record.run_builtin
     16.18 ± 80%      -8.3        7.84 ±141%  perf-profile.children.cycles-pp.console_flush_one_record
     16.18 ± 80%      -8.3        7.84 ±141%  perf-profile.children.cycles-pp.console_unlock
     16.18 ± 80%      -8.3        7.84 ±141%  perf-profile.children.cycles-pp.devkmsg_emit
     16.18 ± 80%      -8.3        7.84 ±141%  perf-profile.children.cycles-pp.devkmsg_write.cold
     16.18 ± 80%      -8.3        7.84 ±141%  perf-profile.children.cycles-pp.vprintk_emit
     15.38 ± 80%      -7.9        7.52 ±141%  perf-profile.children.cycles-pp.serial8250_console_write
     13.92 ± 81%      -6.8        7.11 ±141%  perf-profile.children.cycles-pp.wait_for_lsr
     11.03 ± 77%      -5.3        5.74 ±141%  perf-profile.children.cycles-pp.io_serial_in
      1.58 ± 22%      +0.9        2.46 ± 20%  perf-profile.children.cycles-pp.perf_mmap__push
      1.58 ± 22%      +0.9        2.46 ± 20%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      1.52 ± 25%      +0.9        2.46 ± 20%  perf-profile.children.cycles-pp.record__pushfn
     11.03 ± 77%      -5.3        5.74 ±141%  perf-profile.self.cycles-pp.io_serial_in





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ