lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 22 Apr 2024 15:41:27 +0800
From: Yujie Liu <yujie.liu@...el.com>
To: Josh Poimboeuf <jpoimboe@...nel.org>
CC: Linus Torvalds <torvalds@...ux-foundation.org>, <oe-lkp@...ts.linux.dev>,
	<lkp@...el.com>, <linux-kernel@...r.kernel.org>, Thomas Gleixner
	<tglx@...utronix.de>, Daniel Sneddon <daniel.sneddon@...ux.intel.com>,
	<ying.huang@...el.com>, <feng.tang@...el.com>, <fengwei.yin@...el.com>
Subject: Re: [linus:master] [x86/syscall] 1e3ad78334:
 will-it-scale.per_process_ops 1.4% improvement

Hi Josh,

On Fri, Apr 19, 2024 at 12:33:46AM -0700, Josh Poimboeuf wrote:
> On Fri, Apr 19, 2024 at 01:49:26PM +0800, kernel test robot wrote:
> > Hi Linus,
> > 
> > We noticed that commit 1e3ad78334a6 caused performance fluctuations in
> > various micro benchmarks. The perf stat metrics related with branch
> > instructions do have noticeable changes, which may be an expected
> > result of this commit. We are sending this report to provide these data
> > and hope it can be helpful for the awareness of overall impact or any
> > further investigation. Thanks.
> > 
> > kernel test robot noticed a 1.4% improvement of will-it-scale.per_process_ops on:
> > 
> > commit: 1e3ad78334a69b36e107232e337f9d693dcc9df2 ("x86/syscall: Don't force use of indirect calls for system calls")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> Thanks, these are significant regressions.

First we need to clarify that by running this specific will-it-scale
futex4 benchmark on a Skylake machine, we observed a +1.4% performance
improvement, not a regression.

> Since this is on Skylake (with IBRS enabled, presumably) I'd expect that
> these regressions are fixed by my "Only harden syscalls when needed"
> patch.  I'm planning on posting a new version of that tomorrow, but v3
> [*] should be good enough to fix it.  Could you run these tests on the
> same Skylake system with my patch added?

The v3 patch [*] cannot be applied on commit 1e3ad78334a6. Seems the
code base has changed a lot, so we are not able to directly compare
1e3ad78334a6 and 1e3ad78334a6+v3_patch.

The patch is good to apply on v6.9-rc4, so we tested v6.9-rc4 and
v6.9-rc4+v3_patch. Here are the test results for your reference:

Skylake
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor:
  lkp-skl-fpga01/will-it-scale/debian-12-x86_64-20240206.cgz/x86_64-rhel-8.3/gcc-13/16/process/futex4/performance

commit:
  0cd01ac5dcb1 ("x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file")
  1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
  v6.9-rc4
  v6.9-rc4+v3_patch

    0cd01ac5dcb1                1e3ad78334a6                    v6.9-rc4           v6.9-rc4+v3_patch
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
   1362315            +1.4%    1381406            +1.5%    1382652            +0.5%    1369778        will-it-scale.per_process_ops
  21797058            +1.4%   22102512            +1.5%   22122442            +0.5%   21916453        will-it-scale.workload
      0.04 ±  7%      -7.4%       0.04            -6.1%       0.04 ±  2%      -4.0%       0.04        perf-stat.i.MPKI
  1.98e+09           +19.2%   2.36e+09           +19.2%  2.359e+09            +1.7%  2.014e+09        perf-stat.i.branch-instructions
      1.47            -1.2        0.30            -1.2        0.30 ±  3%      -0.0        1.45        perf-stat.i.branch-miss-rate%
  30820475           -70.4%    9118612           -71.0%    8945551            +0.5%   30985854        perf-stat.i.branch-misses
   7767463            -1.2%    7676829            -1.0%    7686158            -1.3%    7664542        perf-stat.i.cache-references
      3.45            -4.4%       3.30            -4.4%       3.30            -0.4%       3.43        perf-stat.i.cpi
 1.504e+10            +5.1%   1.58e+10            +5.2%  1.582e+10            +1.2%  1.522e+10        perf-stat.i.instructions
      0.29            +4.5%       0.31            +4.6%       0.31            +0.4%       0.29        perf-stat.i.ipc
      1.01 ±100%      -0.6%       1.00 ±100%    +104.1%       2.06            +0.3%       1.01 ±100%  perf-stat.i.metric.K/sec
      0.05 ±  2%      -4.2%       0.04            -3.9%       0.04 ±  2%      +0.4%       0.05        perf-stat.overall.MPKI
      1.56            -1.2        0.39            -1.2        0.38            -0.0        1.54        perf-stat.overall.branch-miss-rate%
      3.43            -4.3%       3.28            -4.4%       3.28            -0.5%       3.41        perf-stat.overall.cpi
      0.29            +4.5%       0.30            +4.6%       0.30            +0.5%       0.29        perf-stat.overall.ipc
    208138            +3.4%     215312            +3.5%     215474            +0.5%     209279        perf-stat.overall.path-length
 1.973e+09           +19.2%  2.353e+09           +19.1%  2.351e+09            +1.8%  2.008e+09        perf-stat.ps.branch-instructions
  30729762           -70.4%    9109071           -71.0%    8918595            +0.6%   30911752        perf-stat.ps.branch-misses
   7745419            -1.1%    7663567            -1.1%    7663740            -1.3%    7647834        perf-stat.ps.cache-references
 1.499e+10            +5.1%  1.575e+10            +5.2%  1.577e+10            +1.2%  1.517e+10        perf-stat.ps.instructions
 4.537e+12            +4.9%  4.759e+12            +5.1%  4.767e+12            +1.1%  4.587e+12        perf-stat.total.instructions
     12.23            -0.6       11.60            -0.6       11.64            -0.0       12.21        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     10.09            -0.6        9.51            -0.5        9.56            -0.1       10.01        perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     22.31            -0.4       21.88            -0.4       21.94            +0.0       22.36        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
     19.15            +0.2       19.30            +0.2       19.38            -0.1       19.04        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
      9.25            +0.2        9.43            +0.0        9.25            -0.0        9.23        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
      8.79            +0.2        9.02            +0.3        9.07            -0.1        8.72        perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      7.13            +0.2        7.36            +0.3        7.41            -0.1        7.07        perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
      8.37            +0.3        8.63            +0.3        8.68            -0.1        8.28        perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     12.38            -0.6       11.78            -0.5       11.84            -0.0       12.38        perf-profile.children.cycles-pp.do_syscall_64
     10.12            -0.5        9.57            -0.5        9.63            -0.1       10.04        perf-profile.children.cycles-pp.__x64_sys_futex
     22.63            -0.4       22.20            -0.4       22.24            +0.0       22.65        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.48 ±  2%      -0.0        0.46            -0.0        0.47 ±  2%      -0.0        0.46        perf-profile.children.cycles-pp.get_futex_key
     19.34            +0.1       19.49            +0.2       19.57            -0.1       19.24        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.00            +0.2        0.18 ±  2%      +0.2        0.18 ±  3%      +0.0        0.00        perf-profile.children.cycles-pp.x64_sys_call
      9.11            +0.2        9.29            +0.0        9.12            +0.0        9.12        perf-profile.children.cycles-pp.entry_SYSCALL_64
      8.88            +0.2        9.11            +0.3        9.16            -0.1        8.81        perf-profile.children.cycles-pp.do_futex
      7.13            +0.2        7.36            +0.3        7.41            -0.1        7.07        perf-profile.children.cycles-pp.__futex_wait
      8.43            +0.3        8.70            +0.3        8.75            -0.1        8.34        perf-profile.children.cycles-pp.futex_wait
      1.20            -0.7        0.47            -0.7        0.46 ±  3%      -0.0        1.20 ±  2%  perf-profile.self.cycles-pp.__x64_sys_futex
      1.46            -0.2        1.27            -0.2        1.26 ±  2%      +0.0        1.48 ±  2%  perf-profile.self.cycles-pp.do_syscall_64
      0.51            -0.1        0.44            -0.1        0.45 ±  2%      +0.0        0.52        perf-profile.self.cycles-pp.do_futex
      0.38 ±  5%      -0.1        0.32 ±  4%      -0.1        0.32 ±  5%      +0.0        0.39 ±  7%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.48 ±  2%      -0.0        0.45            -0.0        0.45 ±  2%      -0.0        0.46 ±  2%  perf-profile.self.cycles-pp.get_futex_key
      1.21            +0.0        1.24 ±  2%      +0.0        1.23 ±  3%      -0.0        1.18        perf-profile.self.cycles-pp.futex_wait
      0.09 ± 14%      +0.0        0.12 ±  8%      +0.0        0.13 ±  6%      +0.0        0.12 ±  5%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      0.00            +0.1        0.15 ±  2%      +0.2        0.15 ±  3%      +0.0        0.00        perf-profile.self.cycles-pp.x64_sys_call
      7.97            +0.1        8.12            -0.0        7.95            -0.0        7.96        perf-profile.self.cycles-pp.entry_SYSCALL_64
     19.28            +0.2       19.44            +0.2       19.53            -0.1       19.21        perf-profile.self.cycles-pp.syscall_return_via_sysret
     10.43            +0.2       10.60            +0.2       10.59            +0.0       10.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.72 ±  3%      +0.2        0.94 ±  3%      +0.2        0.93 ±  4%      +0.0        0.74        perf-profile.self.cycles-pp.__futex_wait

> Also it would be helpful to see the same tests on Cascade/Ice Lake, or
> some other system for which the 'spectre_v2' sysfs vulnerabilities file
> shows "BHI: SW loop".  On such a system it shouldn't matter whether my
> patch is added as it won't disable Linus' syscall change.  But it would
> be very helpful to see the performance impact of that combination.

The test results on Cascade/Ice Lake are as follows:

Intel Xeon Platinum 8260L (Cascade Lake)
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor:
  lkp-csl-2sp3/will-it-scale/debian-12-x86_64-20240206.cgz/x86_64-rhel-8.3/gcc-13/16/process/futex4/performance

commit:
  0cd01ac5dcb1 ("x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file")
  1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")
  v6.9-rc4
  v6.9-rc4+v3_patch

    0cd01ac5dcb1                1e3ad78334a6                    v6.9-rc4           v6.9-rc4+v3_patch
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
   3237910            -0.3%    3229309           -10.1%    2911031           -11.0%    2882769        will-it-scale.per_process_ops
  51806565            -0.3%   51668961           -10.1%   46576504           -11.0%   46124311        will-it-scale.workload
      0.02 ±  7%      -6.4%       0.02 ±  3%      -9.5%       0.02 ±  2%      -2.4%       0.02 ± 12%  perf-stat.i.MPKI
 4.649e+09           +17.4%  5.459e+09           +76.1%  8.186e+09           +75.4%  8.156e+09        perf-stat.i.branch-instructions
      0.72            -0.6        0.15 ±  4%      -0.6        0.12            -0.6        0.12 ±  2%  perf-stat.i.branch-miss-rate%
  34188248           -74.0%    8872232 ±  3%     -69.9%   10285664           -70.0%   10244122        perf-stat.i.branch-misses
      1.70            -4.2%       1.63            -8.0%       1.56            -8.3%       1.56        perf-stat.i.cpi
 3.326e+10            +3.6%  3.444e+10            +9.1%  3.628e+10            +8.2%  3.599e+10        perf-stat.i.instructions
      0.59            +4.3%       0.61            +8.7%       0.64            +9.0%       0.64        perf-stat.i.ipc
      0.18 ± 16%     -11.5%       0.16 ± 22%     -33.9%       0.12 ± 46%     -58.6%       0.08 ± 49%  perf-stat.i.major-faults
      0.02 ±  7%      -6.3%       0.02 ±  4%     -11.0%       0.02 ±  3%      -2.3%       0.02 ± 13%  perf-stat.overall.MPKI
      0.74            -0.6        0.16 ±  3%      -0.6        0.13            -0.6        0.13        perf-stat.overall.branch-miss-rate%
      1.70            -4.1%       1.63            -8.0%       1.56            -8.3%       1.56        perf-stat.overall.cpi
      0.59            +4.3%       0.61            +8.7%       0.64            +9.0%       0.64        perf-stat.overall.ipc
    193210            +3.9%     200708           +21.4%     234594           +21.5%     234812        perf-stat.overall.path-length
 4.633e+09           +17.4%  5.441e+09           +76.1%  8.159e+09           +75.4%  8.129e+09        perf-stat.ps.branch-instructions
  34084869           -74.0%    8860998 ±  2%     -69.9%   10274305           -70.0%   10220106        perf-stat.ps.branch-misses
 3.315e+10            +3.6%  3.433e+10            +9.1%  3.616e+10            +8.2%  3.587e+10        perf-stat.ps.instructions
      0.18 ± 16%     -11.5%       0.16 ± 22%     -33.8%       0.12 ± 46%     -58.6%       0.08 ± 49%  perf-stat.ps.major-faults
 1.001e+13            +3.6%  1.037e+13            +9.2%  1.093e+13            +8.2%  1.083e+13        perf-stat.total.instructions
     18.55            -0.3       18.23            -1.1       17.45            -1.1       17.46        perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      1.82            -0.1        1.74            -0.2        1.60            -0.2        1.57        perf-profile.calltrace.cycles-pp.futex_q_unlock.futex_wait_setup.__futex_wait.futex_wait.do_futex
      3.58            -0.1        3.51            -0.5        3.11            -0.5        3.11        perf-profile.calltrace.cycles-pp.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait.do_futex
     17.39            -0.1       17.32            -1.1       16.32            -1.1       16.30        perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      0.68 ±  2%      -0.0        0.66            -0.1        0.60 ±  2%      -0.1        0.60 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
      2.73            -0.0        2.72            -0.3        2.40            -0.3        2.40        perf-profile.calltrace.cycles-pp.__get_user_nocheck_4.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait
      0.60 ±  2%      +0.0        0.60 ±  2%      -0.0        0.57            -0.0        0.59 ±  2%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
      0.00            +0.0        0.00            +6.3        6.26            +6.2        6.22        perf-profile.calltrace.cycles-pp.clear_bhb_loop.syscall
      0.61 ±  2%      +0.0        0.61 ±  2%      -0.0        0.58            -0.0        0.60 ±  2%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
     72.70            +0.0       72.72            +0.1       72.78            +0.1       72.80        perf-profile.calltrace.cycles-pp.syscall
      1.78            +0.0        1.80            -0.2        1.59            -0.2        1.58        perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait
     16.67            +0.0       16.71            -1.1       15.61            -1.1       15.59        perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     20.50            +0.0       20.55            -0.5       20.04            -0.5       20.01        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     21.76            +0.0       21.81            -0.5       21.22            -0.5       21.24        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
      2.07            +0.1        2.13            -0.2        1.90            -0.2        1.91        perf-profile.calltrace.cycles-pp.futex_hash.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait
      0.76            +0.1        0.84 ±  3%      -0.0        0.74            -0.0        0.74        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      5.09            +0.1        5.17            -0.5        4.60            -0.5        4.62        perf-profile.calltrace.cycles-pp.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex
      7.85            +0.1        7.94            -0.7        7.10            -0.8        7.07        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
      0.91 ±  2%      +0.1        1.04            +0.0        0.92            +0.0        0.92        perf-profile.calltrace.cycles-pp.get_futex_key.futex_wait_setup.__futex_wait.futex_wait.do_futex
     39.86            +0.2       40.02            -4.4       35.46            -4.4       35.48        perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall
     14.13            +0.2       14.35            -0.9       13.20            -1.0       13.18        perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
     12.44            +0.3       12.70            -1.1       11.33            -1.1       11.34        perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex
     18.62            -0.3       18.30            -1.1       17.57            -1.0       17.58        perf-profile.children.cycles-pp.__x64_sys_futex
     17.59            -0.1       17.46            -1.0       16.54            -1.1       16.52        perf-profile.children.cycles-pp.do_futex
      1.82            -0.1        1.74            -0.2        1.60            -0.2        1.57        perf-profile.children.cycles-pp.futex_q_unlock
      3.19            -0.1        3.13            -0.4        2.77            -0.4        2.77        perf-profile.children.cycles-pp.__get_user_nocheck_4
      0.68 ±  2%      -0.0        0.66            -0.1        0.60 ±  2%      -0.1        0.60 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      3.58            -0.0        3.57            -0.4        3.16            -0.4        3.16        perf-profile.children.cycles-pp.futex_get_value_locked
      0.80            -0.0        0.79 ±  4%      -0.0        0.76 ±  2%      -0.1        0.75 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.81            -0.0        0.80 ±  4%      -0.0        0.77 ±  2%      -0.0        0.76 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.46 ±  2%      -0.0        0.45 ±  5%      -0.0        0.43 ±  2%      -0.0        0.42 ±  4%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.35            -0.0        0.34            -0.0        0.30 ±  3%      -0.0        0.30 ±  2%  perf-profile.children.cycles-pp.testcase
      0.66            -0.0        0.65 ±  4%      -0.0        0.62            -0.0        0.62 ±  2%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.38 ±  2%      -0.0        0.38 ±  5%      -0.0        0.36            -0.0        0.35 ±  5%  perf-profile.children.cycles-pp.update_process_times
      0.14 ±  5%      -0.0        0.14 ±  5%      -0.0        0.12 ±  4%      -0.0        0.12 ±  5%  perf-profile.children.cycles-pp.amd_clear_divider
      0.00            +0.0        0.00            +6.3        6.32            +6.3        6.29        perf-profile.children.cycles-pp.clear_bhb_loop
      0.28 ±  6%      +0.0        0.28 ±  4%      -0.0        0.25 ±  3%      -0.0        0.24 ±  2%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.90            +0.0        0.91 ±  3%      -0.1        0.80            -0.1        0.81        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      1.88            +0.0        1.90 ±  2%      -0.2        1.69            -0.2        1.68        perf-profile.children.cycles-pp._raw_spin_lock
      0.17 ±  4%      +0.0        0.20 ±  2%      -0.1        0.12 ±  3%      -0.1        0.12 ±  7%  perf-profile.children.cycles-pp.futex_setup_timer
     20.64            +0.0       20.69            -0.5       20.18            -0.4       20.21        perf-profile.children.cycles-pp.do_syscall_64
     21.90            +0.0       21.94            -0.5       21.41            -0.5       21.43        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      2.07            +0.1        2.13            -0.2        1.90            -0.2        1.91        perf-profile.children.cycles-pp.futex_hash
      5.13            +0.1        5.20            -0.5        4.64            -0.5        4.64        perf-profile.children.cycles-pp.entry_SYSCALL_64
     16.84            +0.1       16.91            -1.1       15.73            -1.1       15.71        perf-profile.children.cycles-pp.futex_wait
      5.30            +0.1        5.37            -0.5        4.79            -0.5        4.80        perf-profile.children.cycles-pp.futex_q_lock
      0.91 ±  2%      +0.1        1.05            +0.0        0.92            +0.0        0.92        perf-profile.children.cycles-pp.get_futex_key
     42.79            +0.2       42.98            -4.6       38.15            -4.6       38.18        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
     14.13            +0.2       14.36            -0.9       13.20            -1.0       13.18        perf-profile.children.cycles-pp.__futex_wait
     12.58            +0.3       12.83            -1.1       11.44            -1.1       11.44        perf-profile.children.cycles-pp.futex_wait_setup
      0.00            +0.4        0.41 ±  2%      +0.6        0.56 ±  2%      +0.6        0.58 ±  3%  perf-profile.children.cycles-pp.x64_sys_call
      4.04            -0.3        3.77            -0.7        3.36            -0.7        3.33        perf-profile.self.cycles-pp.syscall
      1.03 ±  2%      -0.3        0.76 ±  2%      -0.0        1.00            -0.0        1.02        perf-profile.self.cycles-pp.__x64_sys_futex
      0.88            -0.3        0.62            +0.0        0.91            +0.0        0.91        perf-profile.self.cycles-pp.do_futex
      2.50            -0.1        2.42            -0.2        2.30            -0.2        2.31        perf-profile.self.cycles-pp.futex_wait
      1.74            -0.1        1.68            -0.2        1.55            -0.2        1.52        perf-profile.self.cycles-pp.futex_q_unlock
      3.18            -0.1        3.12            -0.4        2.76            -0.4        2.76        perf-profile.self.cycles-pp.__get_user_nocheck_4
      0.54            -0.1        0.48 ±  3%      -0.1        0.43            -0.1        0.44 ±  3%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      1.48            -0.0        1.45 ±  2%      +0.2        1.69 ±  2%      +0.2        1.67        perf-profile.self.cycles-pp.__futex_wait
      0.68 ±  2%      -0.0        0.66            -0.1        0.60 ±  2%      -0.1        0.60 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.35            -0.0        0.34            -0.0        0.30 ±  3%      -0.0        0.30 ±  2%  perf-profile.self.cycles-pp.testcase
      0.00            +0.0        0.00            +6.3        6.26            +6.2        6.22        perf-profile.self.cycles-pp.clear_bhb_loop
      1.33            +0.0        1.33            -0.1        1.23            -0.1        1.22        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.44            +0.0        1.44            -0.1        1.30            -0.1        1.30        perf-profile.self.cycles-pp.futex_q_lock
      1.03            +0.0        1.05 ±  2%      +0.2        1.22            +0.3        1.28        perf-profile.self.cycles-pp.do_syscall_64
      1.80            +0.0        1.84 ±  2%      -0.2        1.62            -0.2        1.62        perf-profile.self.cycles-pp._raw_spin_lock
      2.42            +0.0        2.46            -0.2        2.19            -0.2        2.20        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.38 ±  6%      +0.1        0.44            +0.0        0.38            -0.0        0.38 ±  3%  perf-profile.self.cycles-pp.futex_get_value_locked
      2.00            +0.1        2.06            -0.2        1.83            -0.2        1.84        perf-profile.self.cycles-pp.futex_hash
      0.21 ±  6%      +0.1        0.28 ±  4%      +0.0        0.25 ±  3%      +0.0        0.24 ±  2%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      1.11 ±  3%      +0.1        1.22 ±  3%      -0.0        1.08 ±  2%      -0.0        1.10        perf-profile.self.cycles-pp.futex_wait_setup
      0.90 ±  2%      +0.1        1.04            +0.0        0.92            +0.0        0.92 ±  2%  perf-profile.self.cycles-pp.get_futex_key
     42.61            +0.2       42.81            -4.6       38.00            -4.6       38.02        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.00            +0.4        0.41 ±  2%      +0.6        0.55 ±  2%      +0.5        0.52 ±  3%  perf-profile.self.cycles-pp.x64_sys_call


Intel Xeon Gold 6346 (Ice Lake)
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor:
  lkp-icl-2sp9/will-it-scale/debian-12-x86_64-20240206.cgz/x86_64-rhel-8.3/gcc-13/16/process/futex4/performance

commit:                                                                                                                   0cd01ac5dcb1 ("x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file")                                       1e3ad78334a6 ("x86/syscall: Don't force use of indirect calls for system calls")                                        v6.9-rc4                                                                                                                v6.9-rc4+v3_patch

    0cd01ac5dcb1                1e3ad78334a6                    v6.9-rc4           v6.9-rc4+v3_patch
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
   7907214            -1.8%    7763496           -15.4%    6686457           -15.5%    6678350        will-it-scale.per_process_ops
 1.265e+08            -1.8%  1.242e+08           -15.4%   1.07e+08           -15.5%  1.069e+08        will-it-scale.workload
 1.112e+10           +16.0%   1.29e+10           +67.3%   1.86e+10           +68.0%  1.868e+10        perf-stat.i.branch-instructions
      0.06 ±  2%      -0.0        0.06 ±  2%      -0.0        0.05            -0.0        0.05        perf-stat.i.branch-miss-rate%
   6858604 ±  2%      +0.6%    6900573 ±  2%      +8.4%    7434422            +7.7%    7388238        perf-stat.i.branch-misses
      0.72            -2.0%       0.71            -2.7%       0.70            -2.7%       0.70        perf-stat.i.cpi
 8.004e+10            +2.1%   8.17e+10            +2.8%  8.231e+10            +2.8%  8.232e+10        perf-stat.i.instructions
      1.38            +2.1%       1.41            +2.8%       1.42            +2.8%       1.42        perf-stat.i.ipc
      0.06 ±  2%      -0.0        0.05 ±  2%      -0.0        0.04            -0.0        0.04        perf-stat.overall.branch-miss-rate%
      0.72            -2.0%       0.71            -2.8%       0.70            -2.8%       0.70        perf-stat.overall.cpi
      1.38            +2.1%       1.41            +2.8%       1.42            +2.8%       1.42        perf-stat.overall.ipc
    190470            +3.9%     197929           +21.7%     231786           +21.8%     231973        perf-stat.overall.path-length
 1.108e+10           +16.0%  1.286e+10           +67.3%  1.854e+10           +68.0%  1.862e+10        perf-stat.ps.branch-instructions
   6893534 ±  2%      +0.5%    6924998 ±  2%      +8.3%    7462919            +7.5%    7410265        perf-stat.ps.branch-misses
 7.978e+10            +2.1%  8.143e+10            +2.8%  8.204e+10            +2.8%  8.205e+10        perf-stat.ps.instructions
  2.41e+13            +2.0%  2.459e+13            +2.9%   2.48e+13            +2.9%  2.479e+13        perf-stat.total.instructions
     48.06            -2.8       45.31            -9.9       38.20 ±  3%      -9.1       38.94        perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
     42.93            -2.5       40.41            -8.8       34.12 ±  4%      -8.1       34.84        perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex
     56.45            -2.4       54.10           -12.0       44.44           -11.4       45.05        perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     58.78            -2.3       56.48           -12.5       46.31           -11.9       46.86        perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     61.14            -2.3       58.86           -12.9       48.20           -12.5       48.67        perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     71.08            -1.4       69.71           -12.1       58.96           -12.1       58.95        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
     68.07            -1.2       66.88           -11.8       56.28           -11.8       56.26        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     17.13            -1.1       16.06            -3.8       13.38 ±  7%      -2.9       14.20 ±  5%  perf-profile.calltrace.cycles-pp.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex
    100.03            -0.8       99.22            -1.0       99.01            -1.4       98.60        perf-profile.calltrace.cycles-pp.syscall
     15.22            -0.8       14.42            -2.7       12.53 ±  5%      -2.7       12.52        perf-profile.calltrace.cycles-pp.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait.do_futex
     12.02            -0.6       11.37            -2.1        9.89 ±  6%      -2.1        9.92        perf-profile.calltrace.cycles-pp.__get_user_nocheck_4.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait
      3.12 ±  9%      -0.5        2.61            -0.9        2.22 ± 10%      -1.0        2.08 ±  7%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      7.38 ±  2%      -0.4        6.99            -1.9        5.44 ±  5%      -1.6        5.76 ±  5%  perf-profile.calltrace.cycles-pp.futex_hash.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait
      5.12            -0.3        4.78            -0.9        4.20 ± 10%      -0.6        4.47 ±  5%  perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait
      4.99 ±  2%      -0.3        4.66            -1.0        4.00 ±  5%      -1.2        3.79 ±  6%  perf-profile.calltrace.cycles-pp.futex_q_unlock.futex_wait_setup.__futex_wait.futex_wait.do_futex
      3.02 ±  3%      -0.2        2.80            -0.9        2.17            -0.8        2.25 ±  3%  perf-profile.calltrace.cycles-pp.get_futex_key.futex_wait_setup.__futex_wait.futex_wait.do_futex
      1.58 ±  3%      -0.1        1.51            -0.3        1.29 ± 10%      -0.4        1.22 ±  7%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
      0.94 ±  3%      -0.1        0.87            -0.2        0.75 ±  9%      -0.2        0.70 ±  8%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      0.69 ±  3%      -0.0        0.65            -0.2        0.47 ± 46%      -0.3        0.36 ± 71%  perf-profile.calltrace.cycles-pp.amd_clear_divider.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      0.68 ±  3%      -0.0        0.66            -0.3        0.38 ± 71%      -0.2        0.45 ± 45%  perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall
      0.00            +0.0        0.00           +15.2       15.21 ±  2%     +14.8       14.83        perf-profile.calltrace.cycles-pp.clear_bhb_loop.syscall
      1.04 ±  2%      +0.1        1.13 ±  2%      -0.1        0.96 ±  3%      -0.0        1.00 ±  5%  perf-profile.calltrace.cycles-pp.testcase
      1.57            +0.1        1.70            -0.1        1.48 ±  9%      -0.0        1.54 ±  5%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
      0.00            +1.3        1.29            +1.6        1.62 ±  8%      +1.3        1.34 ±  4%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     16.18            +1.6       17.80            -0.6       15.63 ±  7%      +0.0       16.22 ±  6%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
     48.52            -2.8       45.75           -10.0       38.57 ±  2%      -9.2       39.30        perf-profile.children.cycles-pp.__futex_wait
     44.20            -2.6       41.60            -9.1       35.12 ±  3%      -8.4       35.84        perf-profile.children.cycles-pp.futex_wait_setup
     57.11            -2.4       54.75           -12.1       44.98           -11.5       45.58        perf-profile.children.cycles-pp.futex_wait
     59.22            -2.3       56.91           -12.7       46.54           -12.1       47.10        perf-profile.children.cycles-pp.do_futex
     61.79            -2.3       59.51           -13.0       48.74           -12.6       49.19        perf-profile.children.cycles-pp.__x64_sys_futex
     69.05            -1.5       67.59           -12.1       56.90           -12.2       56.85        perf-profile.children.cycles-pp.do_syscall_64
     71.36            -1.4       70.00           -11.9       59.44           -11.9       59.43        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     17.82            -1.1       16.70            -3.9       13.94 ±  7%      -3.0       14.79 ±  5%  perf-profile.children.cycles-pp.futex_q_lock
     14.54            -0.8       13.76            -2.6       11.96 ±  5%      -2.6       11.98        perf-profile.children.cycles-pp.futex_get_value_locked
     13.16            -0.7       12.46            -2.3       10.83 ±  5%      -2.3       10.84        perf-profile.children.cycles-pp.__get_user_nocheck_4
      3.96 ±  3%      -0.5        3.46            -1.0        2.95 ± 10%      -1.2        2.78 ±  7%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      7.61 ±  2%      -0.4        7.21            -2.0        5.62 ±  4%      -1.6        5.96 ±  5%  perf-profile.children.cycles-pp.futex_hash
      5.35            -0.4        5.00            -1.0        4.40 ± 10%      -0.7        4.67 ±  5%  perf-profile.children.cycles-pp._raw_spin_lock
      5.22 ±  2%      -0.3        4.88            -1.0        4.19 ±  5%      -1.3        3.97 ±  6%  perf-profile.children.cycles-pp.futex_q_unlock
      3.26 ±  3%      -0.2        3.02            -0.9        2.35 ±  2%      -0.8        2.44 ±  3%  perf-profile.children.cycles-pp.get_futex_key
     98.50            -0.1       98.40            +0.1       98.60            +0.1       98.55        perf-profile.children.cycles-pp.syscall
      1.81 ±  3%      -0.1        1.73            -0.3        1.47 ± 10%      -0.4        1.40 ±  7%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.17 ±  3%      -0.1        1.09            -0.2        0.94 ±  9%      -0.3        0.87 ±  8%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.93 ±  3%      -0.1        0.86 ±  2%      -0.2        0.73 ± 10%      -0.2        0.70 ±  8%  perf-profile.children.cycles-pp.amd_clear_divider
      0.16 ±  2%      -0.0        0.15            -0.0        0.15 ±  5%      -0.0        0.15 ±  4%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.16 ±  3%      -0.0        0.14 ±  3%      -0.0        0.14 ±  3%      -0.0        0.14 ±  5%  perf-profile.children.cycles-pp.hrtimer_interrupt
      9.15 ±  2%      -0.0        9.14            -1.5        7.65 ±  7%      -1.6        7.54 ±  3%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.65 ±  4%      -0.0        0.64            -0.1        0.54 ±  7%      -0.1        0.54 ±  2%  perf-profile.children.cycles-pp.futex_setup_timer
      0.00            +0.0        0.00           +15.4       15.40 ±  2%     +15.0       15.01        perf-profile.children.cycles-pp.clear_bhb_loop
      0.62 ±  2%      +0.0        0.66 ±  2%      -0.1        0.56 ±  5%      -0.0        0.58 ±  5%  perf-profile.children.cycles-pp.syscall@plt
      1.45            +0.1        1.57            -0.1        1.33 ±  2%      -0.1        1.39 ±  5%  perf-profile.children.cycles-pp.testcase
      1.59            +0.1        1.72            -0.1        1.49 ±  9%      -0.0        1.56 ±  5%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      9.45            +0.9       10.37            -0.3        9.12 ±  7%      +0.0        9.45 ±  5%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.00            +1.5        1.51            +1.8        1.81 ±  8%      +1.5        1.51 ±  4%  perf-profile.children.cycles-pp.x64_sys_call
     12.90            -0.7       12.23            -2.3       10.63 ±  5%      -2.3       10.64        perf-profile.self.cycles-pp.__get_user_nocheck_4
      7.38 ±  2%      -0.4        6.99            -1.9        5.43 ±  5%      -1.6        5.76 ±  5%  perf-profile.self.cycles-pp.futex_hash
      5.09            -0.4        4.70            -1.0        4.11 ±  8%      -0.7        4.35 ±  6%  perf-profile.self.cycles-pp.futex_q_lock
      5.11            -0.3        4.78            -0.9        4.21 ± 10%      -0.6        4.47 ±  5%  perf-profile.self.cycles-pp._raw_spin_lock
      4.86 ±  2%      -0.3        4.54            -0.9        3.92 ±  5%      -1.1        3.71 ±  7%  perf-profile.self.cycles-pp.futex_q_unlock
      3.68            -0.2        3.45            -0.1        3.62 ±  9%      -0.1        3.56 ±  7%  perf-profile.self.cycles-pp.do_syscall_64
      3.02 ±  3%      -0.2        2.80            -0.9        2.16 ±  2%      -0.8        2.25 ±  3%  perf-profile.self.cycles-pp.get_futex_key
      4.33 ±  3%      -0.2        4.14            -0.9        3.45 ±  6%      -0.9        3.45 ±  2%  perf-profile.self.cycles-pp.__futex_wait
      4.17            -0.2        3.99 ±  3%      -0.9        3.32            -0.9        3.32        perf-profile.self.cycles-pp.futex_wait_setup
      2.09 ±  3%      -0.2        1.93            -0.4        1.64 ±  9%      -0.5        1.56 ±  7%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      1.39 ±  2%      -0.1        1.29            -0.3        1.13 ±  2%      -0.3        1.12 ±  2%  perf-profile.self.cycles-pp.futex_get_value_locked
      1.81 ±  3%      -0.1        1.73            -0.3        1.47 ± 10%      -0.4        1.40 ±  7%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.94 ±  3%      -0.1        0.88            -0.2        0.75 ± 10%      -0.2        0.70 ±  8%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      0.46 ±  2%      -0.0        0.43 ±  2%      -0.1        0.37 ± 11%      -0.1        0.34 ±  6%  perf-profile.self.cycles-pp.amd_clear_divider
      0.43 ±  4%      +0.0        0.43 ±  2%      -0.1        0.36 ±  7%      -0.1        0.36 ±  2%  perf-profile.self.cycles-pp.futex_setup_timer
      0.00            +0.0        0.00           +15.2       15.22 ±  2%     +14.8       14.81        perf-profile.self.cycles-pp.clear_bhb_loop
      8.92 ±  2%      +0.0        8.92            -1.4        7.47 ±  7%      -1.6        7.36 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.20            +0.0        0.22 ±  3%      -0.0        0.18 ±  8%      -0.0        0.20 ±  7%  perf-profile.self.cycles-pp.syscall@plt
      2.76            +0.0        2.81            -0.4        2.38 ± 10%      -0.5        2.27 ±  7%  perf-profile.self.cycles-pp.__x64_sys_futex
      1.90 ±  3%      +0.0        1.95            -0.7        1.23 ± 11%      -0.7        1.22 ±  8%  perf-profile.self.cycles-pp.do_futex
      2.50 ±  2%      +0.1        2.61            +0.1        2.56 ±  8%      +0.1        2.60 ±  4%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.24            +0.1        1.35 ±  2%      -0.1        1.14 ±  3%      -0.0        1.19 ±  5%  perf-profile.self.cycles-pp.testcase
      1.59            +0.1        1.72            -0.1        1.49 ±  9%      -0.0        1.56 ±  5%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      2.72            +0.2        2.94 ±  2%      -0.1        2.61 ±  9%      -0.0        2.69 ±  6%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      8.15 ±  3%      +0.4        8.56            -2.0        6.18 ± 10%      -2.1        6.04 ±  3%  perf-profile.self.cycles-pp.futex_wait
     11.95            +1.0       12.92            -1.0       10.97 ±  4%      -0.6       11.35 ±  4%  perf-profile.self.cycles-pp.syscall
      0.00            +1.3        1.30            +1.6        1.62 ±  8%      +1.3        1.33 ±  4%  perf-profile.self.cycles-pp.x64_sys_call


BTW, we did observe some regressions by running other benchmarks on
commit 1e3ad78334a6, but these regressions are on Ice Lake, not Skylake.
Please kindly contact us if you are interested in looking into them.

    stress-ng.null.ops_per_sec -4.0% regression on Intel Xeon Gold 6346 (Ice Lake)
    unixbench.fsbuffer.throughput -1.4% regression on Intel Xeon Gold 6346 (Ice Lake)

Thanks,
Yujie

> 
> [*] https://lkml.kernel.org/lkml/eda0ec65f4612cc66875aaf76e738643f41fbc01.1713296762.git.jpoimboe@kernel.org
> 
> -- 
> Josh
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ