lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210406072708.GD7352@xsang-OptiPlex-9020>
Date:   Tue, 6 Apr 2021 15:27:08 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...e.de>,
        Miroslav Benes <mbenes@...e.cz>,
        LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
        feng.tang@...el.com, zhengjun.xing@...el.com
Subject: [objtool/x86]  9bc0bb5072:  will-it-scale.per_process_ops 5.6%
 improvement



Greeting,

FYI, we noticed a 5.6% improvement of will-it-scale.per_process_ops due to commit:


commit: 9bc0bb50727c8ac69fbb33fb937431cf3518ff37 ("objtool/x86: Rewrite retpoline thunk calls")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core


in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

	nr_task: 16
	mode: process
	test: eventfd1
	cpufreq_governor: performance
	ucode: 0x5003006

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install                job.yaml  # job file is attached in this email
        bin/lkp split-job --compatible job.yaml
        bin/lkp run                    compatible-job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/eventfd1/will-it-scale/0x5003006

commit: 
  50e7b4a1a1 ("objtool: Skip magical retpoline .altinstr_replacement")
  9bc0bb5072 ("objtool/x86: Rewrite retpoline thunk calls")

50e7b4a1a1b264fc 9bc0bb50727c8ac69fbb33fb937 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  46843229            +5.6%   49479323        will-it-scale.16.processes
   2927701            +5.6%    3092457        will-it-scale.per_process_ops
  46843229            +5.6%   49479323        will-it-scale.workload
      8251 ±  8%     -19.6%       6635 ± 11%  numa-vmstat.node0.nr_slab_reclaimable
     33007 ±  8%     -19.6%      26543 ± 11%  numa-meminfo.node0.KReclaimable
     33007 ±  8%     -19.6%      26543 ± 11%  numa-meminfo.node0.SReclaimable
      1172 ± 12%     -68.4%     370.67 ±141%  perf-sched.wait_and_delay.avg.ms.futex_wait_queue_me.futex_wait.do_futex.__x64_sys_futex
      1172 ± 12%     -68.4%     370.65 ±141%  perf-sched.wait_time.avg.ms.futex_wait_queue_me.futex_wait.do_futex.__x64_sys_futex
    112.67 ± 20%     -32.7%      75.83 ± 19%  interrupts.CPU115.NMI:Non-maskable_interrupts
    112.67 ± 20%     -32.7%      75.83 ± 19%  interrupts.CPU115.PMI:Performance_monitoring_interrupts
    154.00 ± 43%     -41.8%      89.67 ± 40%  interrupts.CPU135.NMI:Non-maskable_interrupts
    154.00 ± 43%     -41.8%      89.67 ± 40%  interrupts.CPU135.PMI:Performance_monitoring_interrupts
    128.50 ± 16%     -39.7%      77.50 ± 33%  interrupts.CPU151.NMI:Non-maskable_interrupts
    128.50 ± 16%     -39.7%      77.50 ± 33%  interrupts.CPU151.PMI:Performance_monitoring_interrupts
    126.50 ± 19%     -39.1%      77.00 ± 34%  interrupts.CPU152.NMI:Non-maskable_interrupts
    126.50 ± 19%     -39.1%      77.00 ± 34%  interrupts.CPU152.PMI:Performance_monitoring_interrupts
    150.67 ± 49%     -52.7%      71.33 ± 33%  interrupts.CPU153.NMI:Non-maskable_interrupts
    150.67 ± 49%     -52.7%      71.33 ± 33%  interrupts.CPU153.PMI:Performance_monitoring_interrupts
    134.67 ± 30%     -45.5%      73.33 ± 33%  interrupts.CPU154.NMI:Non-maskable_interrupts
    134.67 ± 30%     -45.5%      73.33 ± 33%  interrupts.CPU154.PMI:Performance_monitoring_interrupts
    229.00 ± 82%     -64.9%      80.33 ± 38%  interrupts.CPU57.NMI:Non-maskable_interrupts
    229.00 ± 82%     -64.9%      80.33 ± 38%  interrupts.CPU57.PMI:Performance_monitoring_interrupts
      9305 ± 16%     +30.4%      12133 ± 20%  softirqs.CPU116.RCU
      9674 ±  8%     +17.7%      11391 ± 11%  softirqs.CPU121.RCU
     10950 ±  8%     +13.3%      12402 ±  7%  softirqs.CPU160.RCU
     11054 ±  8%     +14.6%      12663 ±  5%  softirqs.CPU161.RCU
     10764 ±  6%     +16.6%      12548 ±  6%  softirqs.CPU163.RCU
     11073 ±  8%     +20.4%      13337 ±  4%  softirqs.CPU164.RCU
     10840 ±  7%     +18.1%      12797 ±  6%  softirqs.CPU165.RCU
     10935 ±  9%     +19.5%      13066 ±  7%  softirqs.CPU166.RCU
     10791 ±  8%     +17.0%      12629 ±  8%  softirqs.CPU168.RCU
     10152 ±  6%     +17.1%      11892 ±  5%  softirqs.CPU171.RCU
     10644 ±  6%     +13.0%      12032 ±  5%  softirqs.CPU172.RCU
     14639 ± 11%     +20.5%      17644 ± 10%  softirqs.CPU3.RCU
     11177 ±  8%     +13.4%      12671 ±  7%  softirqs.CPU64.RCU
     11039 ±  6%     +15.3%      12730 ±  6%  softirqs.CPU67.RCU
     11218 ±  9%     +17.9%      13225 ±  5%  softirqs.CPU68.RCU
     15014 ± 11%     +17.8%      17688 ±  6%  softirqs.CPU7.RCU
     11300 ±  9%     +17.4%      13267 ±  7%  softirqs.CPU70.RCU
     11094 ±  6%     +18.1%      13099 ±  7%  softirqs.CPU71.RCU
     10930 ±  8%     +15.5%      12620 ±  5%  softirqs.CPU72.RCU
     10800 ±  7%     +15.8%      12509 ±  8%  softirqs.CPU75.RCU
     10822 ±  8%     +14.7%      12412 ±  6%  softirqs.CPU76.RCU
     24155 ± 13%     +26.9%      30649 ± 14%  softirqs.CPU99.SCHED
 1.633e+10            +3.7%  1.694e+10        perf-stat.i.branch-instructions
      1.18 ±  9%      -0.6        0.59 ± 15%  perf-stat.i.branch-miss-rate%
 1.881e+08 ±  5%     -47.4%   98885807 ± 15%  perf-stat.i.branch-misses
   4905715 ± 12%     -56.2%    2149255 ± 70%  perf-stat.i.cache-misses
      0.72 ±  5%     -11.5%       0.64 ±  2%  perf-stat.i.cpi
     11933 ± 13%    +264.7%      43517 ± 54%  perf-stat.i.cycles-between-cache-misses
 2.352e+10            +5.6%  2.484e+10        perf-stat.i.dTLB-loads
 1.574e+10            +5.7%  1.664e+10        perf-stat.i.dTLB-stores
 1.748e+08           -47.8%   91212835 ±  6%  perf-stat.i.iTLB-load-misses
 8.088e+10            +5.6%  8.541e+10        perf-stat.i.instructions
    464.80          +104.0%     948.14 ±  6%  perf-stat.i.instructions-per-iTLB-miss
      1.42           +10.6%       1.57 ±  2%  perf-stat.i.ipc
    289.77            +5.1%     304.52        perf-stat.i.metric.M/sec
     54032 ± 36%     -49.7%      27155 ± 26%  perf-stat.i.node-loads
      1.15 ±  5%      -0.6        0.58 ± 16%  perf-stat.overall.branch-miss-rate%
      0.70            -9.3%       0.64 ±  2%  perf-stat.overall.cpi
     11810 ± 13%    +229.4%      38908 ± 51%  perf-stat.overall.cycles-between-cache-misses
    462.63          +103.3%     940.67 ±  6%  perf-stat.overall.instructions-per-iTLB-miss
      1.42           +10.4%       1.57 ±  2%  perf-stat.overall.ipc
 1.627e+10            +3.7%  1.688e+10        perf-stat.ps.branch-instructions
 1.875e+08 ±  5%     -47.4%   98557001 ± 15%  perf-stat.ps.branch-misses
   4889627 ± 12%     -56.2%    2142836 ± 70%  perf-stat.ps.cache-misses
 2.344e+10            +5.6%  2.476e+10        perf-stat.ps.dTLB-loads
 1.569e+10            +5.7%  1.659e+10        perf-stat.ps.dTLB-stores
 1.742e+08           -47.8%   90889010 ±  6%  perf-stat.ps.iTLB-load-misses
 8.061e+10            +5.6%  8.512e+10        perf-stat.ps.instructions
     53915 ± 36%     -49.6%      27175 ± 26%  perf-stat.ps.node-loads
 2.442e+13            +5.3%  2.571e+13        perf-stat.total.instructions
     14.71 ±  7%      -2.0       12.67 ±  8%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      3.80 ± 26%      -1.5        2.29 ± 12%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
      8.34 ±  7%      -1.2        7.13 ±  8%  perf-profile.calltrace.cycles-pp.eventfd_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.32 ± 31%      -1.0        0.30 ±103%  perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      4.89 ±  7%      -0.9        3.98 ±  9%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.72 ±  6%      -0.8        2.94 ±  7%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.70 ±  7%      -0.6        2.15 ± 10%  perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_read.ksys_read.do_syscall_64
      2.85 ±  6%      -0.5        2.36 ±  7%  perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_write.ksys_write.do_syscall_64
      0.72 ±  8%      -0.4        0.28 ±100%  perf-profile.calltrace.cycles-pp.___might_sleep.__might_fault._copy_from_user.eventfd_write.vfs_write
      1.23 ±  8%      -0.3        0.97 ± 10%  perf-profile.calltrace.cycles-pp.__might_fault._copy_from_user.eventfd_write.vfs_write.ksys_write
     14.85 ±  7%      -2.0       12.81 ±  8%  perf-profile.children.cycles-pp.vfs_write
      8.61 ±  6%      -1.7        6.93 ±  8%  perf-profile.children.cycles-pp.security_file_permission
      8.45 ±  7%      -1.2        7.24 ±  8%  perf-profile.children.cycles-pp.eventfd_write
      5.70 ±  7%      -1.1        4.64 ±  9%  perf-profile.children.cycles-pp.common_file_perm
      1.33 ± 31%      -0.8        0.48 ± 28%  perf-profile.children.cycles-pp.menu_select
      3.46 ± 15%      -0.8        2.68 ± 10%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.71 ±  7%      -0.4        0.27 ± 10%  perf-profile.children.cycles-pp.apparmor_file_permission
      2.46 ±  7%      -0.3        2.11 ±  9%  perf-profile.children.cycles-pp.__might_fault
      1.33 ±  7%      -0.2        1.13 ±  9%  perf-profile.children.cycles-pp.___might_sleep
      0.38 ±  8%      +0.1        0.48 ±  9%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      1.57 ± 55%      -1.3        0.30 ± 21%  perf-profile.self.cycles-pp.cpuidle_enter_state
      4.39 ±  7%      -1.1        3.27 ±  8%  perf-profile.self.cycles-pp.common_file_perm
      2.15 ±  6%      -0.8        1.32 ± 11%  perf-profile.self.cycles-pp.eventfd_write
      0.98 ± 42%      -0.8        0.20 ± 44%  perf-profile.self.cycles-pp.menu_select
      2.25 ±  7%      -0.6        1.61 ±  8%  perf-profile.self.cycles-pp.eventfd_read
      0.57 ±  7%      -0.3        0.27 ± 10%  perf-profile.self.cycles-pp.apparmor_file_permission
      1.32 ±  7%      -0.2        1.12 ±  9%  perf-profile.self.cycles-pp.___might_sleep
      0.43 ±  7%      -0.1        0.35 ±  9%  perf-profile.self.cycles-pp.__might_fault
      0.11 ± 12%      -0.0        0.08 ± 16%  perf-profile.self.cycles-pp.read_tsc
      0.07 ±  5%      +0.1        0.13 ± 11%  perf-profile.self.cycles-pp.__x64_sys_write
      0.07 ± 12%      +0.1        0.14 ± 11%  perf-profile.self.cycles-pp.__x64_sys_read
      0.26 ± 10%      +0.1        0.37 ±  8%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare


                                                                                
                             will-it-scale.per_process_ops                      
                                                                                
  3.12e+06 +----------------------------------------------------------------+   
   3.1e+06 |-+                                                              |   
           |                         O O O O                                |   
  3.08e+06 |-+                              O                               |   
  3.06e+06 |-+                                                              |   
           |                                                                |   
  3.04e+06 |-+ O         O   OO   O                                         |   
  3.02e+06 |-O    O        O    O                                           |   
     3e+06 |-+  O                                                           |   
           |                                                                |   
  2.98e+06 |-+      O OO                                                    |   
  2.96e+06 |-+                                                              |   
           |        +.++.+.+.++.+.+.++.+.+.++.+.+.+                         |   
  2.94e+06 |.+     +                               +. .+. +.   .+   .+.+ .+.|   
  2.92e+06 +----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure                   Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org       Intel Corporation

Thanks,
Oliver Sang


View attachment "config-5.12.0-rc5-00072-g9bc0bb50727c" of type "text/plain" (172949 bytes)

View attachment "job-script" of type "text/plain" (7765 bytes)

View attachment "job.yaml" of type "text/plain" (5105 bytes)

View attachment "reproduce" of type "text/plain" (340 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ