lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200615005732.GV12456@shao2-debian>
Date:   Mon, 15 Jun 2020 08:57:32 +0800
From:   kernel test robot <rong.a.chen@...el.com>
To:     Kees Cook <keescook@...omium.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Kees Cook <keescook@...omium.org>,
        Elena Reshetova <elena.reshetova@...el.com>,
        David Windsor <dwindsor@...il.com>,
        Hans Liljestrand <ishkamiel@...il.com>,
        Xiaoming Ni <nixiaoming@...wei.com>,
        Paul Moore <paul@...l-moore.com>, edumazet@...gle.com,
        paulmck@...nel.org, David Howells <dhowells@...hat.com>,
        shakeelb@...gle.com, James Morris <jamorris@...ux.microsoft.com>,
        alex.huangjianhui@...wei.com, dylix.dailei@...wei.com,
        chenzefeng2@...wei.com, linux-kernel@...r.kernel.org,
        lkp@...ts.01.org
Subject: [groups] 67467ae141: will-it-scale.per_process_ops 4.3% improvement

Greeting,

FYI, we noticed a 4.3% improvement of will-it-scale.per_process_ops due to commit:


commit: 67467ae14130847791f230fbc9f261d0c819b9c3 ("[PATCH 2/3] groups: convert group_info.usage to refcount_t")
url: https://github.com/0day-ci/linux/commits/Kees-Cook/Convert-nsproxy-groups-and-creds-to-refcount_t/20200613-023706
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next

in testcase: will-it-scale
on test machine: 104 threads Skylake with 192G memory
with following parameters:

	nr_task: 100%
	mode: process
	test: poll2
	cpufreq_governor: performance
	ucode: 0x2000065

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-skl-fpga01/poll2/will-it-scale/0x2000065

commit: 
  bcaef9d22e ("nsproxy: convert nsproxy.count to refcount_t")
  67467ae141 ("groups: convert group_info.usage to refcount_t")

bcaef9d22e69accf 67467ae14130847791f230fbc9f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    205986            +4.3%     214828        will-it-scale.per_process_ops
  21422614            +4.3%   22342250        will-it-scale.workload
      6978 ± 50%     +99.5%      13922 ± 26%  numa-meminfo.node0.Inactive
      6819 ± 52%    +101.5%      13739 ± 26%  numa-meminfo.node0.Inactive(anon)
      8206 ± 46%     +85.9%      15258 ± 23%  numa-meminfo.node0.Shmem
     21268 ± 16%     -32.4%      14373 ± 25%  numa-meminfo.node1.Inactive
     21078 ± 16%     -32.7%      14195 ± 25%  numa-meminfo.node1.Inactive(anon)
      1704 ± 52%    +101.5%       3434 ± 26%  numa-vmstat.node0.nr_inactive_anon
      2051 ± 46%     +85.9%       3813 ± 23%  numa-vmstat.node0.nr_shmem
      1704 ± 52%    +101.5%       3434 ± 26%  numa-vmstat.node0.nr_zone_inactive_anon
      5270 ± 16%     -32.7%       3549 ± 25%  numa-vmstat.node1.nr_inactive_anon
      5270 ± 16%     -32.7%       3549 ± 25%  numa-vmstat.node1.nr_zone_inactive_anon
      6.22 ±  2%     +11.5%       6.94 ±  6%  sched_debug.cfs_rq:/.nr_spread_over.stddev
   -359667           -32.2%    -243708        sched_debug.cfs_rq:/.spread0.min
    739.75 ±  6%     +13.1%     836.96        sched_debug.cfs_rq:/.util_avg.min
     67.85           -20.0%      54.29 ±  5%  sched_debug.cfs_rq:/.util_avg.stddev
      0.15 ±  5%     +14.7%       0.17 ±  7%  sched_debug.cpu.nr_running.stddev
    450.25 ± 41%    +177.8%       1250 ± 25%  interrupts.39:PCI-MSI.67633154-edge.eth0-TxRx-1
    876.00 ±  9%     +54.8%       1356 ± 21%  interrupts.CPU26.RES:Rescheduling_interrupts
    450.25 ± 41%    +177.8%       1250 ± 25%  interrupts.CPU31.39:PCI-MSI.67633154-edge.eth0-TxRx-1
      5403 ± 27%     +40.9%       7615 ±  5%  interrupts.CPU49.NMI:Non-maskable_interrupts
      5403 ± 27%     +40.9%       7615 ±  5%  interrupts.CPU49.PMI:Performance_monitoring_interrupts
      6577 ± 11%     -35.1%       4267 ± 14%  interrupts.CPU54.RES:Rescheduling_interrupts
    358.00 ± 20%     +91.8%     686.75 ± 70%  interrupts.CPU96.RES:Rescheduling_interrupts
 4.835e+10            +4.2%   5.04e+10        perf-stat.i.branch-instructions
      0.31            -0.0        0.30        perf-stat.i.branch-miss-rate%
 1.407e+08            +1.8%  1.432e+08        perf-stat.i.branch-misses
      6.49 ±  9%      +1.1        7.63 ±  5%  perf-stat.i.cache-miss-rate%
    397271 ±  5%     +24.2%     493463 ±  5%  perf-stat.i.cache-misses
      1.18            -4.2%       1.13        perf-stat.i.cpi
    836738 ±  7%     -22.3%     650325 ±  4%  perf-stat.i.cycles-between-cache-misses
  21389813            +4.3%   22319467        perf-stat.i.dTLB-load-misses
 5.514e+10            +4.3%  5.753e+10        perf-stat.i.dTLB-loads
 2.535e+10            +4.4%  2.646e+10        perf-stat.i.dTLB-stores
  21063107            +4.7%   22042637        perf-stat.i.iTLB-load-misses
  2.39e+11            +4.2%   2.49e+11        perf-stat.i.instructions
      0.85            +4.3%       0.89        perf-stat.i.ipc
      1.19            +2.7%       1.22        perf-stat.i.metric.K/sec
      1238            +4.3%       1292        perf-stat.i.metric.M/sec
     88617            +4.6%      92673        perf-stat.i.node-load-misses
     16016 ±  8%     +11.5%      17852 ±  5%  perf-stat.i.node-loads
      0.29            -0.0        0.28        perf-stat.overall.branch-miss-rate%
      6.75 ±  8%      +1.1        7.83 ±  5%  perf-stat.overall.cache-miss-rate%
      1.18            -4.2%       1.13        perf-stat.overall.cpi
    708307 ±  5%     -19.6%     569690 ±  5%  perf-stat.overall.cycles-between-cache-misses
      0.85            +4.4%       0.89        perf-stat.overall.ipc
 4.819e+10            +4.2%  5.023e+10        perf-stat.ps.branch-instructions
 1.402e+08            +1.8%  1.427e+08        perf-stat.ps.branch-misses
    397188 ±  5%     +24.1%     492884 ±  5%  perf-stat.ps.cache-misses
  21318083            +4.3%   22244871        perf-stat.ps.dTLB-load-misses
 5.495e+10            +4.3%  5.734e+10        perf-stat.ps.dTLB-loads
 2.526e+10            +4.4%  2.637e+10        perf-stat.ps.dTLB-stores
  20991781            +4.7%   21968503        perf-stat.ps.iTLB-load-misses
 2.382e+11            +4.2%  2.482e+11        perf-stat.ps.instructions
     88329            +4.6%      92369        perf-stat.ps.node-load-misses
     16250 ±  7%     +11.0%      18033 ±  5%  perf-stat.ps.node-loads
 7.197e+13            +4.3%  7.507e+13        perf-stat.total.instructions
     18.52            -3.2       15.28        perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.75            -0.2        2.57 ±  3%  perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.85            -0.2        2.69        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.poll
      1.27            -0.1        1.17 ±  2%  perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.81 ±  2%      -0.1        0.75        perf-profile.calltrace.cycles-pp.__kmalloc.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     89.61            +0.2       89.80        perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
     93.97            +0.2       94.16        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.poll
     93.70            +0.2       93.94        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
      2.30            +0.5        2.81        perf-profile.calltrace.cycles-pp.__fdget.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     88.02            +0.8       88.84        perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
     17.85            -2.9       14.92        perf-profile.children.cycles-pp.__fget_light
      2.79            -0.2        2.60 ±  3%  perf-profile.children.cycles-pp._copy_from_user
      2.85            -0.2        2.70        perf-profile.children.cycles-pp.entry_SYSCALL_64
      1.33            -0.1        1.21 ±  2%  perf-profile.children.cycles-pp.__check_object_size
      0.58            -0.1        0.51 ±  2%  perf-profile.children.cycles-pp.__might_fault
      0.87            -0.1        0.81        perf-profile.children.cycles-pp.__kmalloc
      0.37 ±  3%      -0.1        0.31 ±  3%  perf-profile.children.cycles-pp.___might_sleep
      0.12 ±  3%      -0.0        0.10        perf-profile.children.cycles-pp.check_stack_object
     89.63            +0.2       89.81        perf-profile.children.cycles-pp.__x64_sys_poll
     94.00            +0.2       94.20        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     93.75            +0.2       93.98        perf-profile.children.cycles-pp.do_syscall_64
     89.12            +0.2       89.36        perf-profile.children.cycles-pp.do_sys_poll
      2.29            +0.5        2.75        perf-profile.children.cycles-pp.__fdget
     16.65            -3.1       13.54        perf-profile.self.cycles-pp.__fget_light
      2.50            -0.2        2.33        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.38 ±  2%      -0.1        0.30 ±  2%  perf-profile.self.cycles-pp.__check_object_size
      0.36 ±  2%      -0.1        0.30 ±  3%  perf-profile.self.cycles-pp.___might_sleep
      0.52            -0.0        0.47        perf-profile.self.cycles-pp.poll
      0.44            -0.0        0.40        perf-profile.self.cycles-pp.__kmalloc
      0.41            -0.0        0.37 ±  2%  perf-profile.self.cycles-pp.__x64_sys_poll
      0.26            -0.0        0.22 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.17 ±  4%      -0.0        0.15        perf-profile.self.cycles-pp.__might_fault
      0.08 ±  5%      -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.poll_select_set_timeout
      0.11 ±  4%      -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.check_stack_object
      0.09            +0.0        0.10        perf-profile.self.cycles-pp.poll_freewait
      4.02            +0.1        4.07        perf-profile.self.cycles-pp.do_syscall_64
      1.16            +0.2        1.37        perf-profile.self.cycles-pp.__fdget
     65.19            +3.5       68.64        perf-profile.self.cycles-pp.do_sys_poll


                                                                                
                            will-it-scale.per_process_ops                       
                                                                                
  218000 +------------------------------------------------------------------+   
         |                                                                  |   
  216000 |-+               O  O O  O    O  O O  O         O                 |   
         |                           O            O  O O     O O  O O  O O  |   
  214000 |-+                                                                |   
         |                                                                  |   
  212000 |-+O O  O O  O O                                                   |   
         |                                                                  |   
  210000 |-+                                                                |   
         |                                                                  |   
  208000 |-+              .+..+.    .+..+..                              +..|   
         |   .+..+.  .+.+.      +..+           .+.+..       .+.+..      :   |   
  206000 |..+      +.                      +.+.      +.+..+.      +.    :   |   
         |                                                          +..+    |   
  204000 +------------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


View attachment "config-5.7.0-00002-g67467ae141308" of type "text/plain" (202612 bytes)

View attachment "job-script" of type "text/plain" (7414 bytes)

View attachment "job.yaml" of type "text/plain" (5011 bytes)

View attachment "reproduce" of type "text/plain" (338 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ