[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200615005732.GV12456@shao2-debian>
Date: Mon, 15 Jun 2020 08:57:32 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Kees Cook <keescook@...omium.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Kees Cook <keescook@...omium.org>,
Elena Reshetova <elena.reshetova@...el.com>,
David Windsor <dwindsor@...il.com>,
Hans Liljestrand <ishkamiel@...il.com>,
Xiaoming Ni <nixiaoming@...wei.com>,
Paul Moore <paul@...l-moore.com>, edumazet@...gle.com,
paulmck@...nel.org, David Howells <dhowells@...hat.com>,
shakeelb@...gle.com, James Morris <jamorris@...ux.microsoft.com>,
alex.huangjianhui@...wei.com, dylix.dailei@...wei.com,
chenzefeng2@...wei.com, linux-kernel@...r.kernel.org,
lkp@...ts.01.org
Subject: [groups] 67467ae141: will-it-scale.per_process_ops 4.3% improvement
Greeting,
FYI, we noticed a 4.3% improvement of will-it-scale.per_process_ops due to commit:
commit: 67467ae14130847791f230fbc9f261d0c819b9c3 ("[PATCH 2/3] groups: convert group_info.usage to refcount_t")
url: https://github.com/0day-ci/linux/commits/Kees-Cook/Convert-nsproxy-groups-and-creds-to-refcount_t/20200613-023706
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
in testcase: will-it-scale
on test machine: 104 threads Skylake with 192G memory
with following parameters:
nr_task: 100%
mode: process
test: poll2
cpufreq_governor: performance
ucode: 0x2000065
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-skl-fpga01/poll2/will-it-scale/0x2000065
commit:
bcaef9d22e ("nsproxy: convert nsproxy.count to refcount_t")
67467ae141 ("groups: convert group_info.usage to refcount_t")
bcaef9d22e69accf 67467ae14130847791f230fbc9f
---------------- ---------------------------
%stddev %change %stddev
\ | \
205986 +4.3% 214828 will-it-scale.per_process_ops
21422614 +4.3% 22342250 will-it-scale.workload
6978 ± 50% +99.5% 13922 ± 26% numa-meminfo.node0.Inactive
6819 ± 52% +101.5% 13739 ± 26% numa-meminfo.node0.Inactive(anon)
8206 ± 46% +85.9% 15258 ± 23% numa-meminfo.node0.Shmem
21268 ± 16% -32.4% 14373 ± 25% numa-meminfo.node1.Inactive
21078 ± 16% -32.7% 14195 ± 25% numa-meminfo.node1.Inactive(anon)
1704 ± 52% +101.5% 3434 ± 26% numa-vmstat.node0.nr_inactive_anon
2051 ± 46% +85.9% 3813 ± 23% numa-vmstat.node0.nr_shmem
1704 ± 52% +101.5% 3434 ± 26% numa-vmstat.node0.nr_zone_inactive_anon
5270 ± 16% -32.7% 3549 ± 25% numa-vmstat.node1.nr_inactive_anon
5270 ± 16% -32.7% 3549 ± 25% numa-vmstat.node1.nr_zone_inactive_anon
6.22 ± 2% +11.5% 6.94 ± 6% sched_debug.cfs_rq:/.nr_spread_over.stddev
-359667 -32.2% -243708 sched_debug.cfs_rq:/.spread0.min
739.75 ± 6% +13.1% 836.96 sched_debug.cfs_rq:/.util_avg.min
67.85 -20.0% 54.29 ± 5% sched_debug.cfs_rq:/.util_avg.stddev
0.15 ± 5% +14.7% 0.17 ± 7% sched_debug.cpu.nr_running.stddev
450.25 ± 41% +177.8% 1250 ± 25% interrupts.39:PCI-MSI.67633154-edge.eth0-TxRx-1
876.00 ± 9% +54.8% 1356 ± 21% interrupts.CPU26.RES:Rescheduling_interrupts
450.25 ± 41% +177.8% 1250 ± 25% interrupts.CPU31.39:PCI-MSI.67633154-edge.eth0-TxRx-1
5403 ± 27% +40.9% 7615 ± 5% interrupts.CPU49.NMI:Non-maskable_interrupts
5403 ± 27% +40.9% 7615 ± 5% interrupts.CPU49.PMI:Performance_monitoring_interrupts
6577 ± 11% -35.1% 4267 ± 14% interrupts.CPU54.RES:Rescheduling_interrupts
358.00 ± 20% +91.8% 686.75 ± 70% interrupts.CPU96.RES:Rescheduling_interrupts
4.835e+10 +4.2% 5.04e+10 perf-stat.i.branch-instructions
0.31 -0.0 0.30 perf-stat.i.branch-miss-rate%
1.407e+08 +1.8% 1.432e+08 perf-stat.i.branch-misses
6.49 ± 9% +1.1 7.63 ± 5% perf-stat.i.cache-miss-rate%
397271 ± 5% +24.2% 493463 ± 5% perf-stat.i.cache-misses
1.18 -4.2% 1.13 perf-stat.i.cpi
836738 ± 7% -22.3% 650325 ± 4% perf-stat.i.cycles-between-cache-misses
21389813 +4.3% 22319467 perf-stat.i.dTLB-load-misses
5.514e+10 +4.3% 5.753e+10 perf-stat.i.dTLB-loads
2.535e+10 +4.4% 2.646e+10 perf-stat.i.dTLB-stores
21063107 +4.7% 22042637 perf-stat.i.iTLB-load-misses
2.39e+11 +4.2% 2.49e+11 perf-stat.i.instructions
0.85 +4.3% 0.89 perf-stat.i.ipc
1.19 +2.7% 1.22 perf-stat.i.metric.K/sec
1238 +4.3% 1292 perf-stat.i.metric.M/sec
88617 +4.6% 92673 perf-stat.i.node-load-misses
16016 ± 8% +11.5% 17852 ± 5% perf-stat.i.node-loads
0.29 -0.0 0.28 perf-stat.overall.branch-miss-rate%
6.75 ± 8% +1.1 7.83 ± 5% perf-stat.overall.cache-miss-rate%
1.18 -4.2% 1.13 perf-stat.overall.cpi
708307 ± 5% -19.6% 569690 ± 5% perf-stat.overall.cycles-between-cache-misses
0.85 +4.4% 0.89 perf-stat.overall.ipc
4.819e+10 +4.2% 5.023e+10 perf-stat.ps.branch-instructions
1.402e+08 +1.8% 1.427e+08 perf-stat.ps.branch-misses
397188 ± 5% +24.1% 492884 ± 5% perf-stat.ps.cache-misses
21318083 +4.3% 22244871 perf-stat.ps.dTLB-load-misses
5.495e+10 +4.3% 5.734e+10 perf-stat.ps.dTLB-loads
2.526e+10 +4.4% 2.637e+10 perf-stat.ps.dTLB-stores
20991781 +4.7% 21968503 perf-stat.ps.iTLB-load-misses
2.382e+11 +4.2% 2.482e+11 perf-stat.ps.instructions
88329 +4.6% 92369 perf-stat.ps.node-load-misses
16250 ± 7% +11.0% 18033 ± 5% perf-stat.ps.node-loads
7.197e+13 +4.3% 7.507e+13 perf-stat.total.instructions
18.52 -3.2 15.28 perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.75 -0.2 2.57 ± 3% perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.85 -0.2 2.69 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.poll
1.27 -0.1 1.17 ± 2% perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.81 ± 2% -0.1 0.75 perf-profile.calltrace.cycles-pp.__kmalloc.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
89.61 +0.2 89.80 perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
93.97 +0.2 94.16 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.poll
93.70 +0.2 93.94 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
2.30 +0.5 2.81 perf-profile.calltrace.cycles-pp.__fdget.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
88.02 +0.8 88.84 perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.poll
17.85 -2.9 14.92 perf-profile.children.cycles-pp.__fget_light
2.79 -0.2 2.60 ± 3% perf-profile.children.cycles-pp._copy_from_user
2.85 -0.2 2.70 perf-profile.children.cycles-pp.entry_SYSCALL_64
1.33 -0.1 1.21 ± 2% perf-profile.children.cycles-pp.__check_object_size
0.58 -0.1 0.51 ± 2% perf-profile.children.cycles-pp.__might_fault
0.87 -0.1 0.81 perf-profile.children.cycles-pp.__kmalloc
0.37 ± 3% -0.1 0.31 ± 3% perf-profile.children.cycles-pp.___might_sleep
0.12 ± 3% -0.0 0.10 perf-profile.children.cycles-pp.check_stack_object
89.63 +0.2 89.81 perf-profile.children.cycles-pp.__x64_sys_poll
94.00 +0.2 94.20 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
93.75 +0.2 93.98 perf-profile.children.cycles-pp.do_syscall_64
89.12 +0.2 89.36 perf-profile.children.cycles-pp.do_sys_poll
2.29 +0.5 2.75 perf-profile.children.cycles-pp.__fdget
16.65 -3.1 13.54 perf-profile.self.cycles-pp.__fget_light
2.50 -0.2 2.33 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.38 ± 2% -0.1 0.30 ± 2% perf-profile.self.cycles-pp.__check_object_size
0.36 ± 2% -0.1 0.30 ± 3% perf-profile.self.cycles-pp.___might_sleep
0.52 -0.0 0.47 perf-profile.self.cycles-pp.poll
0.44 -0.0 0.40 perf-profile.self.cycles-pp.__kmalloc
0.41 -0.0 0.37 ± 2% perf-profile.self.cycles-pp.__x64_sys_poll
0.26 -0.0 0.22 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.17 ± 4% -0.0 0.15 perf-profile.self.cycles-pp.__might_fault
0.08 ± 5% -0.0 0.07 ± 7% perf-profile.self.cycles-pp.poll_select_set_timeout
0.11 ± 4% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.check_stack_object
0.09 +0.0 0.10 perf-profile.self.cycles-pp.poll_freewait
4.02 +0.1 4.07 perf-profile.self.cycles-pp.do_syscall_64
1.16 +0.2 1.37 perf-profile.self.cycles-pp.__fdget
65.19 +3.5 68.64 perf-profile.self.cycles-pp.do_sys_poll
will-it-scale.per_process_ops
218000 +------------------------------------------------------------------+
| |
216000 |-+ O O O O O O O O O |
| O O O O O O O O O O |
214000 |-+ |
| |
212000 |-+O O O O O O |
| |
210000 |-+ |
| |
208000 |-+ .+..+. .+..+.. +..|
| .+..+. .+.+. +..+ .+.+.. .+.+.. : |
206000 |..+ +. +.+. +.+..+. +. : |
| +..+ |
204000 +------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-5.7.0-00002-g67467ae141308" of type "text/plain" (202612 bytes)
View attachment "job-script" of type "text/plain" (7414 bytes)
View attachment "job.yaml" of type "text/plain" (5011 bytes)
View attachment "reproduce" of type "text/plain" (338 bytes)
Powered by blists - more mailing lists