lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Fri, 2 Nov 2018 09:33:07 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...e.de>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Joerg Roedel <joro@...tes.org>, Jiri Olsa <jolsa@...hat.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, tipbuild@...or.com, lkp@...org
Subject: [LKP] [x86/pti/64]  86635715ee:  will-it-scale.per_thread_ops 4.1%
 improvement

Greeting,

FYI, we noticed a 4.1% improvement of will-it-scale.per_thread_ops due to commit:


commit: 86635715ee4228ded59f662dab36e9732b9c978f ("x86/pti/64: Remove the SYSCALL64 entry trampoline")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/pti

in testcase: will-it-scale
on test machine: 80 threads Skylake with 64G memory
with following parameters:

	nr_task: 100%
	mode: thread
	test: sched_yield
	cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-7/performance/x86_64-rhel-7.2/thread/100%/debian-x86_64-2018-04-03.cgz/lkp-skl-2sp2/sched_yield/will-it-scale

commit: 
  98f05b5138 ("x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space")
  86635715ee ("x86/pti/64: Remove the SYSCALL64 entry trampoline")

98f05b5138f0a9b5 86635715ee4228ded59f662dab 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1946690            +4.1%    2027191        will-it-scale.per_thread_ops
     17228            -1.5%      16976        will-it-scale.time.system_time
      6495            +4.2%       6768        will-it-scale.time.user_time
 1.557e+08            +4.1%  1.622e+08        will-it-scale.workload
     93021 ±153%     -92.9%       6582 ± 41%  turbostat.C1
     99833 ±144%     -88.9%      11048 ± 25%  cpuidle.C1.usage
      2527 ±150%     -94.0%     152.00 ± 44%  cpuidle.POLL.usage
     14344 ± 16%     -90.6%       1351 ±173%  numa-numastat.node0.other_node
      1443 ±161%    +897.9%      14402 ± 16%  numa-numastat.node1.other_node
      5458 ± 73%     -59.7%       2199 ±160%  proc-vmstat.numa_pages_migrated
      5458 ± 73%     -59.7%       2199 ±160%  proc-vmstat.pgmigrate_success
     99.51 ±  6%     +17.9%     117.29 ±  6%  sched_debug.cfs_rq:/.util_avg.stddev
    397470 ± 21%     +46.3%     581313 ± 12%  sched_debug.cpu.avg_idle.min
     53.33 ±  6%     -10.9%      47.54 ±  6%  sched_debug.cpu.ttwu_local.min
      1354 ± 11%     +23.6%       1674 ±  6%  slabinfo.UNIX.active_objs
      1354 ± 11%     +23.6%       1674 ±  6%  slabinfo.UNIX.num_objs
      2423 ±  9%     +22.0%       2958 ±  6%  slabinfo.sock_inode_cache.active_objs
      2423 ±  9%     +22.0%       2958 ±  6%  slabinfo.sock_inode_cache.num_objs
    122834 ± 14%     +35.5%     166488 ±  3%  numa-meminfo.node0.Active
    121575 ± 15%     +36.7%     166251 ±  3%  numa-meminfo.node0.Active(anon)
      1258 ± 42%     -81.1%     237.75 ±173%  numa-meminfo.node0.Active(file)
      1082 ± 10%     -73.1%     291.00 ±145%  numa-meminfo.node0.Inactive(file)
      2971 ±  9%     -28.9%       2113 ± 25%  numa-meminfo.node0.PageTables
     18685          +182.7%      52829 ± 13%  numa-meminfo.node0.Shmem
    185939 ±  9%     -24.4%     140633 ±  4%  numa-meminfo.node1.Active
    185939 ±  9%     -24.7%     139946 ±  4%  numa-meminfo.node1.Active(anon)
      6249 ±  2%     +10.5%       6904 ±  2%  numa-meminfo.node1.KernelStack
      1571 ± 16%     +55.3%       2439 ± 21%  numa-meminfo.node1.PageTables
     55233 ±  3%     +10.6%      61093 ±  3%  numa-meminfo.node1.SUnreclaim
     41375 ±  5%     -84.4%       6447 ±108%  numa-meminfo.node1.Shmem
  8.81e+12            +2.4%  9.019e+12        perf-stat.branch-instructions
      2.16            -0.5        1.66        perf-stat.branch-miss-rate%
   1.9e+11           -21.4%  1.494e+11        perf-stat.branch-misses
      1.55            -2.9%       1.51        perf-stat.cpi
 1.282e+13            +3.3%  1.324e+13        perf-stat.dTLB-loads
 8.156e+12            +2.3%   8.34e+12        perf-stat.dTLB-stores
      1.82 ±  6%     +98.1       99.87        perf-stat.iTLB-load-miss-rate%
  4.11e+08 ±  6%  +12153.3%  5.036e+10        perf-stat.iTLB-load-misses
 2.222e+10 ±  2%     -99.7%   66461819 ±  8%  perf-stat.iTLB-loads
 4.272e+13            +3.3%  4.412e+13        perf-stat.instructions
    104366 ±  6%     -99.2%     876.17        perf-stat.instructions-per-iTLB-miss
      0.64            +3.0%       0.66        perf-stat.ipc
     30383 ± 15%     +36.9%      41582 ±  3%  numa-vmstat.node0.nr_active_anon
    314.75 ± 42%     -81.2%      59.25 ±173%  numa-vmstat.node0.nr_active_file
    742.25 ±  9%     -28.9%     527.50 ± 25%  numa-vmstat.node0.nr_page_table_pages
      4671          +183.0%      13221 ± 13%  numa-vmstat.node0.nr_shmem
     30383 ± 15%     +36.9%      41582 ±  3%  numa-vmstat.node0.nr_zone_active_anon
    314.75 ± 42%     -81.2%      59.25 ±173%  numa-vmstat.node0.nr_zone_active_file
     14332 ± 16%     -89.5%       1504 ±154%  numa-vmstat.node0.numa_other
     46526 ±  9%     -24.8%      34987 ±  4%  numa-vmstat.node1.nr_active_anon
      6249 ±  2%     +10.5%       6902 ±  3%  numa-vmstat.node1.nr_kernel_stack
    392.25 ± 16%     +55.3%     609.25 ± 21%  numa-vmstat.node1.nr_page_table_pages
     10364 ±  5%     -84.4%       1612 ±108%  numa-vmstat.node1.nr_shmem
     13808 ±  3%     +10.6%      15273 ±  3%  numa-vmstat.node1.nr_slab_unreclaimable
     46526 ±  9%     -24.8%      34987 ±  4%  numa-vmstat.node1.nr_zone_active_anon
    136694            +9.4%     149504        numa-vmstat.node1.numa_other
     20.17           -20.2        0.00        perf-profile.calltrace.cycles-pp.__entry_SYSCALL_64_trampoline
     54.02            -1.4       52.62        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
     51.83            -0.9       50.89        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.66            -0.2        2.41        perf-profile.calltrace.cycles-pp.pick_next_entity.pick_next_task_fair.__sched_text_start.schedule.__x64_sys_sched_yield
      0.80 ±  2%      -0.1        0.73 ±  3%  perf-profile.calltrace.cycles-pp.__list_del_entry_valid.pick_next_task_fair.__sched_text_start.schedule.__x64_sys_sched_yield
      0.73            +0.0        0.75        perf-profile.calltrace.cycles-pp.__list_add_valid.pick_next_task_fair.__sched_text_start.schedule.__x64_sys_sched_yield
      1.93            +0.2        2.08        perf-profile.calltrace.cycles-pp._raw_spin_lock.__sched_text_start.schedule.__x64_sys_sched_yield.do_syscall_64
      2.48            +0.2        2.66        perf-profile.calltrace.cycles-pp.yield_task_fair.do_sched_yield.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.53 ±  3%      +0.3        0.79 ±  3%  perf-profile.calltrace.cycles-pp.clear_buddies.pick_next_entity.pick_next_task_fair.__sched_text_start.schedule
      7.10            +0.4        7.55        perf-profile.calltrace.cycles-pp.do_sched_yield.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
     37.40            +0.5       37.87        perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.5        0.53 ±  2%  perf-profile.calltrace.cycles-pp.__x86_indirect_thunk_rax
     38.68            +0.5       39.22        perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.04            +0.8        4.84        perf-profile.calltrace.cycles-pp.__calc_delta.update_curr.pick_next_task_fair.__sched_text_start.schedule
     19.43            +0.9       20.35        perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
     47.51            +1.1       48.59        perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00           +21.5       21.51        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
     20.91           -20.9        0.00        perf-profile.children.cycles-pp.__entry_SYSCALL_64_trampoline
     54.20            -1.5       52.74        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     51.99            -0.9       51.08        perf-profile.children.cycles-pp.do_syscall_64
      2.92            -0.1        2.81 ±  2%  perf-profile.children.cycles-pp.pick_next_entity
      0.85 ±  2%      -0.1        0.77 ±  3%  perf-profile.children.cycles-pp.__list_del_entry_valid
      0.75            +0.0        0.77        perf-profile.children.cycles-pp.__list_add_valid
      0.45            +0.0        0.48        perf-profile.children.cycles-pp.rcu_note_context_switch
      4.52            +0.1        4.59        perf-profile.children.cycles-pp.update_rq_clock
      0.40            +0.1        0.47 ±  2%  perf-profile.children.cycles-pp.check_cfs_rq_runtime
      2.51            +0.2        2.69        perf-profile.children.cycles-pp.yield_task_fair
      4.04            +0.2        4.25        perf-profile.children.cycles-pp._raw_spin_lock
      0.58 ±  4%      +0.3        0.86 ±  2%  perf-profile.children.cycles-pp.clear_buddies
      7.11            +0.5        7.57        perf-profile.children.cycles-pp.do_sched_yield
     37.75            +0.5       38.28        perf-profile.children.cycles-pp.__sched_text_start
     38.70            +0.5       39.24        perf-profile.children.cycles-pp.schedule
      4.44            +0.8        5.26        perf-profile.children.cycles-pp.__calc_delta
     22.68            +1.1       23.77        perf-profile.children.cycles-pp.syscall_return_via_sysret
     47.77            +1.1       48.87        perf-profile.children.cycles-pp.__x64_sys_sched_yield
      0.00            +1.3        1.26        perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
      0.00           +21.5       21.54        perf-profile.children.cycles-pp.entry_SYSCALL_64
     20.83           -20.8        0.00        perf-profile.self.cycles-pp.__entry_SYSCALL_64_trampoline
      3.96            -2.0        2.01        perf-profile.self.cycles-pp.do_syscall_64
      7.41            -0.7        6.66        perf-profile.self.cycles-pp.update_curr
      2.29            -0.6        1.65        perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      2.22            -0.4        1.83 ±  2%  perf-profile.self.cycles-pp.pick_next_entity
      0.29 ±  2%      -0.0        0.26 ±  2%  perf-profile.self.cycles-pp.task_of
      0.72            +0.0        0.75        perf-profile.self.cycles-pp.__list_add_valid
      0.44            +0.0        0.47        perf-profile.self.cycles-pp.rcu_note_context_switch
      0.23            +0.1        0.29        perf-profile.self.cycles-pp.check_cfs_rq_runtime
      2.06            +0.1        2.19        perf-profile.self.cycles-pp.__x64_sys_sched_yield
      2.27            +0.2        2.44        perf-profile.self.cycles-pp.yield_task_fair
      3.94            +0.2        4.12        perf-profile.self.cycles-pp._raw_spin_lock
      0.39 ±  3%      +0.2        0.58 ±  2%  perf-profile.self.cycles-pp.clear_buddies
      2.32            +0.2        2.54        perf-profile.self.cycles-pp.do_sched_yield
      7.06            +0.2        7.31        perf-profile.self.cycles-pp.pick_next_task_fair
      5.34            +0.2        5.59        perf-profile.self.cycles-pp.__sched_text_start
      4.35            +0.8        5.17        perf-profile.self.cycles-pp.__calc_delta
     22.66            +1.1       23.75        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.00            +1.1        1.14        perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
      0.00           +21.5       21.54        perf-profile.self.cycles-pp.entry_SYSCALL_64


                                                                                
                             will-it-scale.per_thread_ops                       
                                                                                
   2.1e+06 +-+--------------------------------------------------------------+   
  2.08e+06 +-+                                   O                          |   
           |                                  O     O O  O  O               |   
  2.06e+06 O-+O    O  O            O  O  O  O                               |   
  2.04e+06 +-+  O        O O  O  O                                       O  |   
           |                                                     O     O    O   
  2.02e+06 +-+                                                O     O       |   
     2e+06 +-+                                                              |   
  1.98e+06 +-+                       .+..  .+.  .+                          |   
           |             +.+..+..+.+.    +.   +.  :                         |   
  1.96e+06 +-+          +                         :         +.  .+..        |   
  1.94e+06 +-++.+      +                           :      ..  +.    +..+    |   
           |     +    +                            : .+..+                  |   
  1.92e+06 +-+    + ..                              +                       |   
   1.9e+06 +-+--------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen

View attachment "config-4.19.0-rc2-00179-g8663571" of type "text/plain" (167672 bytes)

View attachment "job.yaml" of type "text/plain" (4869 bytes)

View attachment "reproduce" of type "text/plain" (314 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ