[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20200620143735.GF5535@shao2-debian>
Date: Sat, 20 Jun 2020 22:37:35 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andriin@...com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: [bpf] 6890896bd7: will-it-scale.per_process_ops -5.2% regression
Greeting,
FYI, we noticed a -5.2% regression of will-it-scale.per_process_ops due to commit:
commit: 6890896bd765b0504761c61901c9804fca23bfb2 ("bpf: Fix missing bpf_base_func_proto in cgroup_base_func_proto for CGROUP_NET=n")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 8G memory
with following parameters:
nr_task: 16
mode: process
test: mmap2
cpufreq_governor: performance
ucode: 0x21
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <rong.a.chen@...el.com>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-7.6/process/16/debian-x86_64-20191114.cgz/lkp-ivb-d02/mmap2/will-it-scale/0x21
commit:
745abfaa9e ("bpf, riscv: Fix tail call count off by one in RV32 BPF JIT")
6890896bd7 ("bpf: Fix missing bpf_base_func_proto in cgroup_base_func_proto for CGROUP_NET=n")
745abfaa9eafa597 6890896bd765b0504761c61901c
---------------- ---------------------------
%stddev %change %stddev
\ | \
63928 -5.2% 60630 will-it-scale.per_process_ops
1022864 -5.2% 970096 will-it-scale.workload
63.20 +2.4% 64.71 boot-time.idle
2439913 ± 19% +35.0% 3294976 ± 10% cpuidle.C6.time
202.25 ± 30% -31.0% 139.50 slabinfo.fsnotify_mark_connector.active_objs
16.38 ± 19% +45.3% 23.79 ± 10% sched_debug.cfs_rq:/.nr_spread_over.max
6.61 ± 36% +53.3% 10.13 ± 12% sched_debug.cfs_rq:/.nr_spread_over.stddev
4230 ±125% +371.5% 19944 ± 64% sched_debug.cfs_rq:/.spread0.max
238.04 ± 13% -44.2% 132.83 ± 28% sched_debug.cfs_rq:/.util_est_enqueued.min
59416 ± 4% -9.2% 53953 ± 2% sched_debug.cpu.sched_count.min
16080 ± 11% -21.6% 12609 ± 7% sched_debug.cpu.ttwu_count.min
13342 ± 10% -24.3% 10099 ± 6% sched_debug.cpu.ttwu_local.min
2.661e+09 -5.1% 2.525e+09 perf-stat.i.branch-instructions
14914488 -4.1% 14298109 perf-stat.i.branch-misses
4.04 +0.4 4.46 ± 3% perf-stat.i.cache-miss-rate%
4438980 -5.8% 4183284 perf-stat.i.cache-references
1.15 +5.4% 1.21 perf-stat.i.cpi
3.355e+09 -4.8% 3.194e+09 perf-stat.i.dTLB-loads
4442441 ± 4% -9.6% 4014009 perf-stat.i.dTLB-store-misses
1.579e+09 -4.9% 1.502e+09 perf-stat.i.dTLB-stores
45.02 +12.4 57.47 perf-stat.i.iTLB-load-miss-rate%
1224412 ± 3% -3.7% 1178714 perf-stat.i.iTLB-load-misses
1500623 -41.9% 872114 perf-stat.i.iTLB-loads
1.142e+10 -5.1% 1.083e+10 perf-stat.i.instructions
0.87 -5.1% 0.83 perf-stat.i.ipc
0.91 ± 2% -2.5% 0.89 perf-stat.i.metric.K/sec
1901 -4.9% 1807 perf-stat.i.metric.M/sec
4.32 ± 2% +0.3 4.64 ± 3% perf-stat.overall.cache-miss-rate%
1.15 +5.4% 1.21 perf-stat.overall.cpi
44.92 +12.5 57.47 perf-stat.overall.iTLB-load-miss-rate%
0.87 -5.1% 0.83 perf-stat.overall.ipc
2.652e+09 -5.1% 2.517e+09 perf-stat.ps.branch-instructions
14865210 -4.1% 14250821 perf-stat.ps.branch-misses
4424338 -5.8% 4169457 perf-stat.ps.cache-references
3.344e+09 -4.8% 3.183e+09 perf-stat.ps.dTLB-loads
4427694 ± 4% -9.6% 4000716 perf-stat.ps.dTLB-store-misses
1.574e+09 -4.9% 1.497e+09 perf-stat.ps.dTLB-stores
1220351 ± 3% -3.7% 1174812 perf-stat.ps.iTLB-load-misses
1495642 -41.9% 869227 perf-stat.ps.iTLB-loads
1.138e+10 -5.1% 1.08e+10 perf-stat.ps.instructions
3.437e+12 -5.1% 3.262e+12 perf-stat.total.instructions
3.19 ± 13% -0.9 2.25 ± 3% perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
23.75 -0.8 22.98 perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
41.40 -0.8 40.63 perf-profile.calltrace.cycles-pp.mmap64
18.86 -0.6 18.21 perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
27.02 -0.6 26.40 perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
26.71 -0.5 26.17 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
0.74 ± 3% -0.1 0.62 ± 8% perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
2.66 ± 2% -0.1 2.55 ± 2% perf-profile.calltrace.cycles-pp.d_path.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
2.60 -0.1 2.50 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.munmap
1.96 -0.1 1.89 ± 2% perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.shmem_get_unmapped_area.get_unmapped_area.do_mmap.vm_mmap_pgoff
0.90 ± 5% +0.1 0.98 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap
1.04 ± 9% +0.1 1.19 ± 6% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_trace.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
0.70 ± 6% +0.3 1.01 ± 12% perf-profile.calltrace.cycles-pp.kmem_cache_free.remove_vma.__do_munmap.__vm_munmap.__x64_sys_munmap
1.40 ± 6% +0.4 1.84 ± 5% perf-profile.calltrace.cycles-pp.remove_vma.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.27 ±100% +0.5 0.73 ± 16% perf-profile.calltrace.cycles-pp.up_read.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
44.30 +0.5 44.77 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.82 ± 2% +0.6 5.40 ± 3% perf-profile.calltrace.cycles-pp.free_pgd_range.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
4.42 ± 2% +0.6 5.02 ± 3% perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.unmap_region.__do_munmap.__vm_munmap
58.12 +0.8 58.91 perf-profile.calltrace.cycles-pp.munmap
45.84 +0.8 46.64 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
46.34 +0.9 47.21 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
51.57 +0.9 52.47 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.munmap
51.28 +0.9 52.21 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
41.75 -0.8 40.95 perf-profile.children.cycles-pp.mmap64
23.84 -0.8 23.05 perf-profile.children.cycles-pp.do_mmap
18.98 -0.6 18.35 perf-profile.children.cycles-pp.mmap_region
27.09 -0.6 26.48 perf-profile.children.cycles-pp.vm_mmap_pgoff
1.41 ± 10% -0.5 0.88 ± 4% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
4.18 ± 2% -0.4 3.77 ± 3% perf-profile.children.cycles-pp.percpu_counter_add_batch
5.20 -0.2 5.02 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.56 ± 3% -0.1 0.44 ± 13% perf-profile.children.cycles-pp.cap_vm_enough_memory
2.69 -0.1 2.57 ± 2% perf-profile.children.cycles-pp.d_path
0.75 ± 4% -0.1 0.64 ± 7% perf-profile.children.cycles-pp.security_vm_enough_memory_mm
0.38 ± 6% -0.1 0.28 ± 30% perf-profile.children.cycles-pp.security_mmap_addr
0.27 ± 13% -0.1 0.18 ± 26% perf-profile.children.cycles-pp.may_expand_vm
0.20 ± 9% -0.1 0.14 ± 16% perf-profile.children.cycles-pp.cap_capable
0.21 ± 7% +0.1 0.26 ± 9% perf-profile.children.cycles-pp.cap_mmap_file
0.00 +0.1 0.05 ± 9% perf-profile.children.cycles-pp.profile_munmap
0.45 +0.1 0.51 ± 5% perf-profile.children.cycles-pp.lru_add_drain
0.83 ± 3% +0.1 0.97 ± 4% perf-profile.children.cycles-pp.__might_sleep
0.43 ± 2% +0.1 0.57 ± 7% perf-profile.children.cycles-pp.fpregs_assert_state_consistent
1.10 ± 8% +0.2 1.26 ± 7% perf-profile.children.cycles-pp.kmem_cache_alloc_trace
0.52 ± 10% +0.3 0.78 ± 16% perf-profile.children.cycles-pp.up_read
0.70 ± 6% +0.3 1.02 ± 12% perf-profile.children.cycles-pp.kmem_cache_free
1.44 ± 6% +0.4 1.89 ± 5% perf-profile.children.cycles-pp.remove_vma
44.38 +0.5 44.86 perf-profile.children.cycles-pp.__do_munmap
86.47 +0.5 86.96 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
85.91 +0.5 86.41 perf-profile.children.cycles-pp.do_syscall_64
4.86 ± 2% +0.6 5.42 ± 3% perf-profile.children.cycles-pp.free_pgd_range
4.43 ± 2% +0.6 5.02 ± 3% perf-profile.children.cycles-pp.free_p4d_range
58.50 +0.8 59.29 perf-profile.children.cycles-pp.munmap
45.89 +0.8 46.71 perf-profile.children.cycles-pp.__vm_munmap
46.38 +0.9 47.27 perf-profile.children.cycles-pp.__x64_sys_munmap
1.37 ± 10% -0.5 0.84 ± 3% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.26 ± 12% -0.1 0.16 ± 25% perf-profile.self.cycles-pp.may_expand_vm
0.19 ± 10% -0.1 0.13 ± 21% perf-profile.self.cycles-pp.cap_capable
0.11 ± 9% -0.0 0.09 ± 7% perf-profile.self.cycles-pp.security_mmap_addr
0.07 ± 12% +0.0 0.10 ± 11% perf-profile.self.cycles-pp.lru_add_drain
0.23 ± 7% +0.0 0.27 ± 5% perf-profile.self.cycles-pp.userfaultfd_unmap_prep
0.19 ± 10% +0.0 0.23 ± 11% perf-profile.self.cycles-pp.cap_mmap_file
0.75 ± 3% +0.1 0.86 ± 4% perf-profile.self.cycles-pp.__might_sleep
0.41 ± 2% +0.1 0.53 ± 10% perf-profile.self.cycles-pp.fpregs_assert_state_consistent
1.23 ± 2% +0.2 1.40 ± 2% perf-profile.self.cycles-pp.__do_munmap
0.63 ± 17% +0.2 0.83 ± 16% perf-profile.self.cycles-pp.common_file_perm
0.50 ± 10% +0.3 0.75 ± 18% perf-profile.self.cycles-pp.up_read
0.69 ± 6% +0.3 1.01 ± 13% perf-profile.self.cycles-pp.kmem_cache_free
4.40 ± 2% +0.6 4.98 ± 3% perf-profile.self.cycles-pp.free_p4d_range
will-it-scale.per_process_ops
64500 +-------------------------------------------------------------------+
| +.+ .+ |
64000 |-+.+.+.. + + +. .+.+.+ |
63500 |.+ +.+ +.+.+. .+. .+. : +. |
| +. + +.+. : |
63000 |-+ + |
62500 |-+ O O O |
| O O O |
62000 |-+ O |
61500 |-O O O O |
| O O O O O O |
61000 |-+ O O |
60500 |-+ O O O O O O O O O |
| O O O |
60000 +-------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-5.7.0-rc2-00635-g6890896bd765b0" of type "text/plain" (202662 bytes)
View attachment "job-script" of type "text/plain" (7675 bytes)
View attachment "job.yaml" of type "text/plain" (5293 bytes)
View attachment "reproduce" of type "text/plain" (337 bytes)
Powered by blists - more mailing lists