lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20180930065120.GM15893@shao2-debian>
Date:   Sun, 30 Sep 2018 14:51:20 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...e.de>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Joerg Roedel <joro@...tes.org>, Jiri Olsa <jolsa@...hat.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>, lkp@...org
Subject: [LKP] [x86/pti/64]  bf904d2762:  will-it-scale.per_thread_ops 1.7%
 improvement

Greeting,

FYI, we noticed a 1.7% improvement of will-it-scale.per_thread_ops due to commit:


commit: bf904d2762ee6fc1e4acfcb0772bbfb4a27ad8a6 ("x86/pti/64: Remove the SYSCALL64 entry trampoline")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
with following parameters:

	nr_task: 16
	mode: thread
	test: pwrite1
	cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-7/performance/x86_64-rhel-7.2/thread/16/debian-x86_64-2018-04-03.cgz/lkp-bdw-ep3d/pwrite1/will-it-scale

commit: 
  98f05b5138 ("x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space")
  bf904d2762 ("x86/pti/64: Remove the SYSCALL64 entry trampoline")

98f05b5138f0a9b5 bf904d2762ee6fc1e4acfcb077 
---------------- -------------------------- 
       fail:runs  %reproduction    fail:runs
           |             |             |    
          1:4          -25%            :4     dmesg.WARNING:at#for_ip_interrupt_entry/0x
          2:4          -50%            :4     dmesg.WARNING:at_ip_fsnotify/0x
         %stddev     %change         %stddev
             \          |                \  
   1221307            +1.7%    1242132        will-it-scale.per_thread_ops
      7349 ±  3%      +3.6%       7616        will-it-scale.time.minor_page_faults
    675.23            +1.8%     687.28        will-it-scale.time.user_time
  19540927            +1.7%   19874128        will-it-scale.workload
      4323 ± 16%     -54.7%       1958 ±103%  numa-numastat.node0.other_node
     98872 ± 24%     +33.4%     131877 ± 10%  numa-meminfo.node0.AnonPages
      2292 ±  8%     -10.5%       2050 ±  7%  numa-meminfo.node1.PageTables
     24718 ± 24%     +33.4%      32969 ± 10%  numa-vmstat.node0.nr_anon_pages
      7864 ± 12%     +21.7%       9568 ± 15%  numa-vmstat.node1
    573.00 ±  8%     -10.6%     512.50 ±  7%  numa-vmstat.node1.nr_page_table_pages
      2.25 ± 15%     -50.0%       1.12 ± 60%  sched_debug.cfs_rq:/.load_avg.min
    418.57 ± 87%     -81.2%      78.54 ±173%  sched_debug.cfs_rq:/.removed.runnable_sum.avg
      7842 ± 70%     -76.0%       1885 ±173%  sched_debug.cfs_rq:/.removed.runnable_sum.max
      1734 ± 77%     -78.3%     376.68 ±173%  sched_debug.cfs_rq:/.removed.runnable_sum.stddev
  -2477409            -0.1%   -2474518        sched_debug.cfs_rq:/.spread0.min
    209211 ± 19%     -30.2%     146101 ± 30%  sched_debug.cpu.avg_idle.min
     70.04 ±  7%     -18.4%      57.17 ±  7%  sched_debug.cpu.cpu_load[2].max
     66.92 ±  5%     -11.1%      59.46 ±  6%  sched_debug.cpu.cpu_load[3].max
      6736 ± 23%     +37.8%       9285 ±  9%  sched_debug.cpu.ttwu_local.max
      1672 ± 12%     +32.2%       2210 ± 11%  sched_debug.cpu.ttwu_local.stddev
      1.81            -0.3        1.56        perf-stat.branch-miss-rate%
 4.262e+10           -13.3%  3.696e+10        perf-stat.branch-misses
      1.27            -1.3%       1.25        perf-stat.cpi
      0.01 ±  7%      -0.0        0.00 ±  2%  perf-stat.dTLB-load-miss-rate%
 5.163e+08 ±  7%     -62.1%  1.958e+08 ±  2%  perf-stat.dTLB-load-misses
 4.318e+12            +1.4%   4.38e+12        perf-stat.dTLB-loads
      0.01 ±  6%      -0.0        0.00 ±  4%  perf-stat.dTLB-store-miss-rate%
 4.264e+08 ±  6%     -69.6%  1.294e+08 ±  4%  perf-stat.dTLB-store-misses
 2.915e+12            +1.1%  2.947e+12        perf-stat.dTLB-stores
      2.21 ±  3%     +95.5       97.67        perf-stat.iTLB-load-miss-rate%
 2.564e+08 ±  3%   +2372.0%  6.338e+09        perf-stat.iTLB-load-misses
 1.136e+10           -98.7%  1.509e+08        perf-stat.iTLB-loads
  1.18e+13            +1.4%  1.196e+13        perf-stat.instructions
     46053 ±  3%     -95.9%       1887 ±  2%  perf-stat.instructions-per-iTLB-miss
      0.79            +1.4%       0.80        perf-stat.ipc
      8.65 ±  4%      -8.7        0.00        perf-profile.calltrace.cycles-pp.__entry_SYSCALL_64_trampoline
      0.57 ±  4%      -0.2        0.39 ± 57%  perf-profile.calltrace.cycles-pp.___might_sleep.down_write.generic_file_write_iter.__vfs_write.vfs_write
      0.00            +8.4        8.41 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64
      9.48 ±  5%      -9.5        0.00        perf-profile.children.cycles-pp.__entry_SYSCALL_64_trampoline
      0.03 ±100%      +0.0        0.07 ± 17%  perf-profile.children.cycles-pp.clockevents_program_event
      0.01 ±173%      +0.1        0.07 ± 23%  perf-profile.children.cycles-pp.ktime_get
      0.31 ±  6%      +0.1        0.37 ±  6%  perf-profile.children.cycles-pp.smp_apic_timer_interrupt
      0.35 ±  8%      +0.1        0.42 ±  6%  perf-profile.children.cycles-pp.apic_timer_interrupt
      0.00            +0.2        0.17 ±  4%  perf-profile.children.cycles-pp.__x86_indirect_thunk_r10
      0.00            +1.0        0.96        perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
      0.00            +8.4        8.42 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      9.31 ±  4%      -9.3        0.00        perf-profile.self.cycles-pp.__entry_SYSCALL_64_trampoline
      1.55 ±  6%      -0.6        0.97        perf-profile.self.cycles-pp.do_syscall_64
      1.03 ±  5%      -0.2        0.81 ±  4%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.00            +0.1        0.12 ±  3%  perf-profile.self.cycles-pp.__x86_indirect_thunk_r10
      0.00            +0.8        0.81 ±  2%  perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
      0.00            +8.4        8.42 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64


                                                                                
                              will-it-scale.per_thread_ops                      
                                                                                
  1.255e+06 +-+-------------------------------------------------------------+   
   1.25e+06 +-+         O                                                   |   
            |    O        O O O                                             |   
  1.245e+06 O-+ O  O O O                    O   O  O O OO O   O      O      |   
   1.24e+06 +-O                    O O OO        O          O   OO O   OO   |   
            |                   OO        O   O                             |   
  1.235e+06 +-+                                                             |   
   1.23e+06 +-+                                                             |   
  1.225e+06 +-+.+                        .+.+                               |   
            |    +.+.   +.        .+.+.++    +                        .+   .|   
   1.22e+06 +-+      +.+  +. .+. +            +.      .+ .+.+.+.++.+.+  +.+ |   
  1.215e+06 +-+             +   +               +   .+  +                   |   
            |                                    :.+                        |   
   1.21e+06 +-+                                  +                          |   
  1.205e+06 +-+-------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                will-it-scale.workload                          
                                                                                
  2.01e+07 +-+--------------------------------------------------------------+   
           |                                                                |   
     2e+07 +-+          OO O O                                              |   
           O    O     O                     O  O   O  O                     |   
  1.99e+07 +-O O  O O             O O O O            O  O O O OO O O O OO   |   
  1.98e+07 +-+                 O O        O  O   O                          |   
           |                                                                |   
  1.97e+07 +-+                                                              |   
           |                              +.+                               |   
  1.96e+07 +-+.+ .+.   .+         +. .+. +  :                              .|   
  1.95e+07 +-+  +   +.+  +. .+    : +   +    :             .+.++.+. .+.++.+ |   
           |               +  + .+           +.+    .++.+.+        +        |   
  1.94e+07 +-+                 +                :  +                        |   
           |                                    : +                         |   
  1.93e+07 +-+--------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen

View attachment "config-4.19.0-rc2-00179-gbf904d2" of type "text/plain" (167672 bytes)

View attachment "job-script" of type "text/plain" (6938 bytes)

View attachment "job.yaml" of type "text/plain" (4592 bytes)

View attachment "reproduce" of type "text/plain" (310 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ