lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202210272158.a9585179-oliver.sang@intel.com>
Date:   Fri, 28 Oct 2022 15:07:36 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        <linux-kernel@...r.kernel.org>, <x86@...nel.org>,
        <ying.huang@...el.com>, <feng.tang@...el.com>,
        <zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>
Subject: [tip:x86/core] [x86/retbleed]  80e4c1cd42:
 will-it-scale.per_thread_ops -5.4% regression


Hi Thomas,

though we call it a 'regression' in title by following parent-vs-commit rule
in our reporting, we understand from commit message this is actually a big
improvement if comparing to 'microcode mitigation' which could cause up to
30% performance drop.

we still report it out FYI about possible performance impact to some micro
benchmark.


Greeting,

FYI, we noticed a -5.4% regression of will-it-scale.per_thread_ops due to commit:


commit: 80e4c1cd42fff110bfdae8fce7ac4f22465f9664 ("x86/retbleed: Add X86_FEATURE_CALL_DEPTH")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core

in testcase: will-it-scale
on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
with following parameters:

	nr_task: 100%
	mode: thread
	test: futex3
	cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202210272158.a9585179-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-11/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-csl-2ap4/futex3/will-it-scale

commit: 
  bea75b3389 ("x86/Kconfig: Introduce function padding")
  80e4c1cd42 ("x86/retbleed: Add X86_FEATURE_CALL_DEPTH")

bea75b33895f7f87 80e4c1cd42fff110bfdae8fce7a 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 1.335e+09            -5.4%  1.263e+09        will-it-scale.192.threads
   6951370            -5.4%    6578078        will-it-scale.per_thread_ops
 1.335e+09            -5.4%  1.263e+09        will-it-scale.workload
     33.29            -3.0%      32.30        boot-time.dhcp
      0.97 ±  2%      +0.1        1.07 ±  2%  mpstat.cpu.all.irq%
     83145 ±146%     -94.2%       4796 ±  6%  turbostat.C1
    878.33 ±  4%     +11.7%     981.00 ± 11%  proc-vmstat.direct_map_level2_splits
     77333            +2.2%      79018        proc-vmstat.nr_slab_unreclaimable
     47455 ± 12%     -23.2%      36450 ± 17%  proc-vmstat.numa_hint_faults
     43003 ± 32%     -37.6%      26846 ± 37%  proc-vmstat.numa_pages_migrated
     43003 ± 32%     -37.6%      26846 ± 37%  proc-vmstat.pgmigrate_success
    198321 ± 12%     +21.9%     241714 ± 14%  numa-meminfo.node1.AnonPages
    200294 ± 12%     +21.0%     242442 ± 14%  numa-meminfo.node1.Inactive
    200294 ± 12%     +21.0%     242442 ± 14%  numa-meminfo.node1.Inactive(anon)
    229302 ± 15%     -28.5%     163948 ± 17%  numa-meminfo.node2.AnonPages
    231172 ± 16%     -28.5%     165270 ± 17%  numa-meminfo.node2.Inactive
    231172 ± 16%     -28.5%     165270 ± 17%  numa-meminfo.node2.Inactive(anon)
     49578 ± 12%     +22.1%      60515 ± 14%  numa-vmstat.node1.nr_anon_pages
     50070 ± 12%     +21.2%      60697 ± 14%  numa-vmstat.node1.nr_inactive_anon
     50071 ± 12%     +21.2%      60697 ± 14%  numa-vmstat.node1.nr_zone_inactive_anon
     57327 ± 15%     -28.4%      41064 ± 17%  numa-vmstat.node2.nr_anon_pages
     57794 ± 16%     -28.4%      41393 ± 17%  numa-vmstat.node2.nr_inactive_anon
     57794 ± 16%     -28.4%      41393 ± 17%  numa-vmstat.node2.nr_zone_inactive_anon
      0.01 ±  4%      +7.7%       0.02        perf-stat.i.MPKI
 8.662e+10            -5.4%  8.197e+10        perf-stat.i.branch-instructions
 3.336e+08            -4.5%  3.187e+08        perf-stat.i.branch-misses
     15.22 ±  2%      +1.1       16.33 ±  2%  perf-stat.i.cache-miss-rate%
   1193768 ±  4%      +8.9%    1300334 ±  2%  perf-stat.i.cache-misses
      0.99            +5.9%       1.05        perf-stat.i.cpi
 1.439e+11            -5.4%  1.362e+11        perf-stat.i.dTLB-loads
      0.00            +0.0        0.00        perf-stat.i.dTLB-store-miss-rate%
    255388            -1.9%     250535        perf-stat.i.dTLB-store-misses
 1.079e+11            -5.4%  1.021e+11        perf-stat.i.dTLB-stores
 5.753e+11            -5.4%  5.444e+11        perf-stat.i.instructions
      1.01            -5.5%       0.96        perf-stat.i.ipc
      1762            -5.4%       1667        perf-stat.i.metric.M/sec
    233635 ±  3%      +6.3%     248433        perf-stat.i.node-load-misses
    106279 ±  3%     +13.5%     120679        perf-stat.i.node-store-misses
      0.01 ±  4%      +7.1%       0.02        perf-stat.overall.MPKI
     15.04            +1.0       16.08 ±  2%  perf-stat.overall.cache-miss-rate%
      0.99            +5.9%       1.05        perf-stat.overall.cpi
      0.00            +0.0        0.00        perf-stat.overall.dTLB-store-miss-rate%
      1.01            -5.5%       0.96        perf-stat.overall.ipc
 8.633e+10            -5.4%   8.17e+10        perf-stat.ps.branch-instructions
 3.325e+08            -4.5%  3.176e+08        perf-stat.ps.branch-misses
 1.434e+11            -5.4%  1.357e+11        perf-stat.ps.dTLB-loads
    254865            -1.9%     249975        perf-stat.ps.dTLB-store-misses
 1.075e+11            -5.4%  1.017e+11        perf-stat.ps.dTLB-stores
 5.734e+11            -5.4%  5.426e+11        perf-stat.ps.instructions
    232956 ±  3%      +6.3%     247689        perf-stat.ps.node-load-misses
    105863 ±  3%     +13.6%     120215        perf-stat.ps.node-store-misses
 1.739e+14            -5.4%  1.645e+14        perf-stat.total.instructions
     33.36            -2.0       31.32        perf-profile.calltrace.cycles-pp.__entry_text_start.syscall
      1.70            -0.3        1.36        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      6.47            -0.3        6.14        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      2.22            -0.1        2.13        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
     97.70            -0.1       97.62        perf-profile.calltrace.cycles-pp.syscall
      0.92            -0.1        0.86        perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall
      3.48            +0.0        3.51        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      1.98            +0.0        2.02        perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
      8.68            +0.7        9.40        perf-profile.calltrace.cycles-pp.futex_hash.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
      5.94            +1.2        7.16        perf-profile.calltrace.cycles-pp.get_futex_key.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
     23.75            +1.5       25.23        perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
     51.01            +2.2       53.22        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
     43.94            +2.6       46.55        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     32.18            +3.0       35.18        perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     26.90            +3.4       30.27        perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
     21.50            -1.3       20.19        perf-profile.children.cycles-pp.__entry_text_start
     12.92            -0.8       12.09        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      7.38            -0.5        6.89        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      1.90            -0.4        1.53        perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      2.40            -0.1        2.30        perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.19 ±  7%      +0.0        0.23 ±  4%  perf-profile.children.cycles-pp.perf_prepare_sample
      0.22 ±  6%      +0.0        0.26 ±  3%  perf-profile.children.cycles-pp.perf_tp_event
      0.22 ±  6%      +0.0        0.26 ±  3%  perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
      0.01 ±223%      +0.0        0.06 ±  9%  perf-profile.children.cycles-pp.account_user_time
      0.01 ±223%      +0.1        0.06 ± 14%  perf-profile.children.cycles-pp.account_system_index_time
      0.36 ±  4%      +0.1        0.41 ±  5%  perf-profile.children.cycles-pp.scheduler_tick
      0.31 ±  5%      +0.1        0.37 ±  4%  perf-profile.children.cycles-pp.task_tick_fair
      0.24 ±  9%      +0.1        0.30 ±  5%  perf-profile.children.cycles-pp.update_curr
      0.01 ±223%      +0.1        0.08 ± 12%  perf-profile.children.cycles-pp.__perf_event_header__init_id
      0.01 ±223%      +0.1        0.08 ± 12%  perf-profile.children.cycles-pp.__task_pid_nr_ns
      0.47 ±  7%      +0.1        0.57 ±  5%  perf-profile.children.cycles-pp.update_process_times
      0.77 ±  4%      +0.1        0.87 ±  4%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.71 ±  4%      +0.1        0.82 ±  4%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.48 ±  8%      +0.1        0.59 ±  5%  perf-profile.children.cycles-pp.tick_sched_handle
      0.67 ±  4%      +0.1        0.78 ±  4%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.66 ±  4%      +0.1        0.78 ±  4%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.54 ±  7%      +0.1        0.65 ±  4%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.50 ±  8%      +0.1        0.61 ±  5%  perf-profile.children.cycles-pp.tick_sched_timer
      8.78            +0.8        9.55        perf-profile.children.cycles-pp.futex_hash
      6.02            +1.3        7.37        perf-profile.children.cycles-pp.get_futex_key
     24.17            +1.9       26.11        perf-profile.children.cycles-pp.futex_wake
     51.45            +2.2       53.67        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     44.71            +2.6       47.28        perf-profile.children.cycles-pp.do_syscall_64
     32.55            +3.1       35.65        perf-profile.children.cycles-pp.__x64_sys_futex
     27.26            +3.2       30.44        perf-profile.children.cycles-pp.do_futex
     12.56            -0.8       11.75        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
     16.26            -0.7       15.60        perf-profile.self.cycles-pp.syscall
      9.62            -0.6        9.02        perf-profile.self.cycles-pp.__entry_text_start
      6.82            -0.4        6.46        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.59            -0.2        1.37        perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      2.21            -0.2        1.98        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      2.88            -0.1        2.75        perf-profile.self.cycles-pp.do_syscall_64
      5.51            -0.1        5.40        perf-profile.self.cycles-pp.syscall_return_via_sysret
      2.38            -0.1        2.27        perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      3.41            +0.0        3.43        perf-profile.self.cycles-pp.exit_to_user_mode_prepare
      0.32 ±  2%      +0.0        0.34 ±  2%  perf-profile.self.cycles-pp.syscall@plt
      0.01 ±223%      +0.0        0.06 ±  9%  perf-profile.self.cycles-pp.account_user_time
      0.01 ±223%      +0.1        0.06 ± 14%  perf-profile.self.cycles-pp.account_system_index_time
      0.01 ±223%      +0.1        0.07 ± 12%  perf-profile.self.cycles-pp.__task_pid_nr_ns
      8.59            +0.4        9.01        perf-profile.self.cycles-pp.futex_hash
      9.35            +0.5        9.83        perf-profile.self.cycles-pp.futex_wake
      3.22            +1.0        4.17        perf-profile.self.cycles-pp.do_futex
      5.71            +1.2        6.92        perf-profile.self.cycles-pp.get_futex_key




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://01.org/lkp



View attachment "config-6.1.0-rc1-00040-g80e4c1cd42ff" of type "text/plain" (166272 bytes)

View attachment "job-script" of type "text/plain" (7825 bytes)

View attachment "job.yaml" of type "text/plain" (5344 bytes)

View attachment "reproduce" of type "text/plain" (346 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ