[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210412075244.GB22051@xsang-OptiPlex-9020>
Date: Mon, 12 Apr 2021 15:52:44 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Song Liu <songliubraving@...com>
Cc: Alexei Starovoitov <ast@...nel.org>, KP Singh <kpsingh@...nel.org>,
Martin KaFai Lau <kafai@...com>,
LKML <linux-kernel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>,
lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com,
feng.tang@...el.com, zhengjun.xing@...el.com
Subject: [bpf] a10787e6d5: will-it-scale.per_process_ops 3.5% improvement
Greeting,
FYI, we noticed a 3.5% improvement of will-it-scale.per_process_ops due to commit:
commit: a10787e6d58c24b51e91c19c6d16c5da89fcaa4b ("bpf: Enable task local storage for tracing programs")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:
nr_task: 16
mode: process
test: mmap2
cpufreq_governor: performance
ucode: 0x5003006
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/mmap2/will-it-scale/0x5003006
commit:
9c8f21e6f8 ("xsk: Build skb by page (aka generic zerocopy xmit)")
a10787e6d5 ("bpf: Enable task local storage for tracing programs")
9c8f21e6f8856a96 a10787e6d58c24b51e91c19c6d1
---------------- ---------------------------
%stddev %change %stddev
\ | \
8990002 +3.5% 9304107 will-it-scale.16.processes
561874 +3.5% 581506 will-it-scale.per_process_ops
8990002 +3.5% 9304107 will-it-scale.workload
112185 ± 23% +46.6% 164508 ± 22% numa-numastat.node0.local_node
63.33 ± 93% -80.8% 12.17 ±130% numa-vmstat.node0.nr_inactive_file
63.33 ± 93% -80.8% 12.17 ±130% numa-vmstat.node0.nr_zone_inactive_file
14212 ± 23% +41.7% 20144 ± 14% softirqs.CPU15.SCHED
30141 ± 13% -22.5% 23370 ± 14% softirqs.CPU59.SCHED
66.17 ± 88% -90.7% 6.17 ± 48% interrupts.CPU60.RES:Rescheduling_interrupts
500.00 +86.1% 930.33 ± 60% interrupts.CPU69.CAL:Function_call_interrupts
396.17 ± 6% -18.8% 321.50 ± 21% interrupts.CPU87.NMI:Non-maskable_interrupts
396.17 ± 6% -18.8% 321.50 ± 21% interrupts.CPU87.PMI:Performance_monitoring_interrupts
5.45 ± 46% -98.5% 0.08 ± 73% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
176.51 ± 36% -61.2% 68.51 ± 77% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
5.45 ± 46% -98.5% 0.08 ± 73% perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
176.50 ± 36% -61.2% 68.50 ± 77% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
2.304e+10 +3.4% 2.383e+10 perf-stat.i.branch-instructions
72536156 +4.1% 75492267 perf-stat.i.branch-misses
0.48 -3.3% 0.47 perf-stat.i.cpi
0.00 ± 15% -0.0 0.00 ± 9% perf-stat.i.dTLB-load-miss-rate%
2.404e+10 +3.4% 2.487e+10 perf-stat.i.dTLB-loads
1.096e+10 +3.4% 1.133e+10 perf-stat.i.dTLB-stores
47654226 +12.8% 53744349 perf-stat.i.iTLB-load-misses
9.562e+10 +3.4% 9.889e+10 perf-stat.i.instructions
2015 -8.4% 1847 perf-stat.i.instructions-per-iTLB-miss
2.06 +3.5% 2.14 perf-stat.i.ipc
659.67 +3.4% 682.32 perf-stat.i.metric.M/sec
0.48 -3.4% 0.47 perf-stat.overall.cpi
0.00 ± 18% -0.0 0.00 ± 14% perf-stat.overall.dTLB-load-miss-rate%
2006 -8.3% 1840 perf-stat.overall.instructions-per-iTLB-miss
2.07 +3.5% 2.14 perf-stat.overall.ipc
2.297e+10 +3.4% 2.375e+10 perf-stat.ps.branch-instructions
72285805 +4.1% 75236431 perf-stat.ps.branch-misses
2.396e+10 +3.4% 2.479e+10 perf-stat.ps.dTLB-loads
1.092e+10 +3.4% 1.13e+10 perf-stat.ps.dTLB-stores
47489125 +12.8% 53563329 perf-stat.ps.iTLB-load-misses
9.529e+10 +3.4% 9.856e+10 perf-stat.ps.instructions
2.876e+13 +3.5% 2.976e+13 perf-stat.total.instructions
44.75 -7.7 37.01 ± 11% perf-profile.calltrace.cycles-pp.__munmap
42.13 -7.2 34.95 ± 11% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
41.64 -7.1 34.53 ± 11% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
41.21 -7.1 34.11 ± 11% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
41.45 -7.1 34.36 ± 11% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
39.74 -6.9 32.83 ± 11% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
33.92 -6.2 27.75 ± 11% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
25.32 -5.7 19.64 ± 11% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
24.74 -5.7 19.08 ± 11% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
10.59 -3.7 6.89 ± 11% perf-profile.calltrace.cycles-pp.___might_sleep.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
1.60 -0.5 1.06 ± 32% perf-profile.calltrace.cycles-pp.__entry_text_start.__mmap
2.94 -0.4 2.56 ± 10% perf-profile.calltrace.cycles-pp.d_path.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
2.85 ± 2% -0.4 2.47 ± 11% perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
0.66 ± 6% -0.4 0.29 ±101% perf-profile.calltrace.cycles-pp.strlen.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
2.39 ± 3% -0.3 2.10 ± 11% perf-profile.calltrace.cycles-pp.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff
1.30 ± 3% -0.2 1.08 ± 11% perf-profile.calltrace.cycles-pp.security_mmap_file.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.97 ± 2% -0.2 0.78 ± 11% perf-profile.calltrace.cycles-pp.find_vma.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.67 ± 3% -0.2 0.49 ± 45% perf-profile.calltrace.cycles-pp.touch_atime.shmem_mmap.mmap_region.do_mmap.vm_mmap_pgoff
0.90 ± 5% -0.2 0.73 ± 8% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
0.78 ± 5% -0.1 0.63 ± 8% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle
26.40 ± 4% +10.3 36.72 ± 17% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
26.40 ± 4% +10.3 36.72 ± 17% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
26.40 ± 4% +10.3 36.72 ± 17% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
26.11 ± 5% +10.4 36.49 ± 18% perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
26.00 ± 5% +10.4 36.40 ± 18% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
27.39 ± 4% +11.1 38.45 ± 18% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
25.93 ± 4% +11.4 37.32 ± 18% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
67.97 -10.6 57.41 ± 11% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
66.99 -10.4 56.56 ± 11% perf-profile.children.cycles-pp.do_syscall_64
44.75 -7.4 37.31 ± 11% perf-profile.children.cycles-pp.__munmap
41.23 -7.1 34.12 ± 11% perf-profile.children.cycles-pp.__vm_munmap
41.47 -7.1 34.38 ± 11% perf-profile.children.cycles-pp.__x64_sys_munmap
39.79 -6.9 32.88 ± 11% perf-profile.children.cycles-pp.__do_munmap
33.98 -6.2 27.81 ± 11% perf-profile.children.cycles-pp.unmap_region
25.35 -5.7 19.67 ± 11% perf-profile.children.cycles-pp.unmap_vmas
24.73 -5.6 19.12 ± 11% perf-profile.children.cycles-pp.unmap_page_range
11.68 -3.9 7.83 ± 11% perf-profile.children.cycles-pp.___might_sleep
2.98 -0.4 2.59 ± 10% perf-profile.children.cycles-pp.d_path
2.87 ± 2% -0.4 2.49 ± 11% perf-profile.children.cycles-pp.get_unmapped_area
2.49 ± 2% -0.3 2.18 ± 11% perf-profile.children.cycles-pp.kmem_cache_alloc
2.09 -0.3 1.80 ± 11% perf-profile.children.cycles-pp.__entry_text_start
2.31 -0.3 2.02 ± 10% perf-profile.children.cycles-pp.zap_pte_range
1.31 ± 3% -0.2 1.09 ± 11% perf-profile.children.cycles-pp.security_mmap_file
1.24 ± 2% -0.2 1.05 ± 10% perf-profile.children.cycles-pp.down_write
1.00 -0.2 0.81 ± 10% perf-profile.children.cycles-pp.find_vma
0.66 ± 6% -0.1 0.52 ± 15% perf-profile.children.cycles-pp.strlen
0.66 ± 3% -0.1 0.53 ± 12% perf-profile.children.cycles-pp.common_file_perm
0.69 ± 3% -0.1 0.58 ± 10% perf-profile.children.cycles-pp.touch_atime
0.36 ± 4% -0.1 0.29 ± 8% perf-profile.children.cycles-pp.sync_mm_rss
0.40 ± 3% -0.1 0.34 ± 8% perf-profile.children.cycles-pp.downgrade_write
0.19 ± 12% -0.1 0.13 ± 21% perf-profile.children.cycles-pp.cap_capable
0.25 ± 4% -0.1 0.20 ± 10% perf-profile.children.cycles-pp.vmacache_find
0.18 ± 7% -0.0 0.14 ± 10% perf-profile.children.cycles-pp.tlb_flush_mmu
0.19 ± 7% -0.0 0.15 ± 13% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.13 ± 11% -0.0 0.10 ± 15% perf-profile.children.cycles-pp.__libc_start_main
0.13 ± 11% -0.0 0.10 ± 15% perf-profile.children.cycles-pp.main
0.13 ± 11% -0.0 0.10 ± 15% perf-profile.children.cycles-pp.run_builtin
0.12 ± 10% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.timestamp_truncate
0.09 ± 5% -0.0 0.06 ± 20% perf-profile.children.cycles-pp.common_mmap
0.19 ± 9% -0.0 0.16 ± 5% perf-profile.children.cycles-pp.may_expand_vm
0.19 ± 6% -0.0 0.16 ± 5% perf-profile.children.cycles-pp.userfaultfd_unmap_complete
0.09 ± 12% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.vm_pgprot_modify
0.08 ± 6% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.get_align_mask
0.10 ± 7% +0.0 0.13 ± 14% perf-profile.children.cycles-pp.blocking_notifier_call_chain
0.08 ± 22% +0.0 0.13 ± 12% perf-profile.children.cycles-pp.munmap@plt
26.40 ± 4% +10.3 36.72 ± 17% perf-profile.children.cycles-pp.start_secondary
27.39 ± 4% +11.1 38.45 ± 18% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
27.39 ± 4% +11.1 38.45 ± 18% perf-profile.children.cycles-pp.cpu_startup_entry
27.39 ± 4% +11.1 38.45 ± 18% perf-profile.children.cycles-pp.do_idle
27.10 ± 4% +11.1 38.21 ± 18% perf-profile.children.cycles-pp.cpuidle_enter
27.09 ± 4% +11.1 38.21 ± 18% perf-profile.children.cycles-pp.cpuidle_enter_state
26.00 ± 4% +11.3 37.32 ± 18% perf-profile.children.cycles-pp.intel_idle
11.56 -3.8 7.71 ± 11% perf-profile.self.cycles-pp.___might_sleep
1.28 ± 4% -0.2 1.07 ± 10% perf-profile.self.cycles-pp.perf_event_mmap
1.01 -0.2 0.84 ± 11% perf-profile.self.cycles-pp.__entry_text_start
1.08 ± 4% -0.2 0.92 ± 9% perf-profile.self.cycles-pp.kmem_cache_alloc
0.66 ± 6% -0.1 0.51 ± 14% perf-profile.self.cycles-pp.strlen
0.67 -0.1 0.54 ± 11% perf-profile.self.cycles-pp.find_vma
0.50 ± 4% -0.1 0.40 ± 12% perf-profile.self.cycles-pp.common_file_perm
0.50 ± 6% -0.1 0.41 ± 11% perf-profile.self.cycles-pp.get_obj_cgroup_from_current
0.34 ± 4% -0.1 0.28 ± 9% perf-profile.self.cycles-pp.sync_mm_rss
0.39 ± 3% -0.1 0.33 ± 8% perf-profile.self.cycles-pp.downgrade_write
0.17 ± 13% -0.1 0.11 ± 21% perf-profile.self.cycles-pp.cap_capable
0.24 ± 3% -0.0 0.20 ± 10% perf-profile.self.cycles-pp.vmacache_find
0.15 ± 7% -0.0 0.11 ± 25% perf-profile.self.cycles-pp.menu_select
0.39 ± 3% -0.0 0.34 ± 7% perf-profile.self.cycles-pp.__vm_munmap
0.08 ± 8% -0.0 0.04 ± 73% perf-profile.self.cycles-pp.common_mmap
0.13 ± 11% -0.0 0.09 ± 6% perf-profile.self.cycles-pp.tlb_flush_mmu
0.15 ± 6% -0.0 0.12 ± 12% perf-profile.self.cycles-pp.touch_atime
0.13 ± 10% -0.0 0.10 ± 10% perf-profile.self.cycles-pp.remove_vma
0.11 ± 11% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.timestamp_truncate
0.18 ± 10% -0.0 0.15 ± 8% perf-profile.self.cycles-pp.may_expand_vm
0.16 ± 4% -0.0 0.13 ± 6% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.18 ± 5% -0.0 0.15 ± 11% perf-profile.self.cycles-pp.get_unmapped_area
0.19 ± 6% -0.0 0.16 ± 5% perf-profile.self.cycles-pp.userfaultfd_unmap_complete
0.13 ± 5% -0.0 0.11 ± 10% perf-profile.self.cycles-pp.prepend
0.10 ± 7% +0.0 0.13 ± 14% perf-profile.self.cycles-pp.blocking_notifier_call_chain
26.00 ± 4% +11.3 37.32 ± 18% perf-profile.self.cycles-pp.intel_idle
will-it-scale.per_process_ops
585000 +------------------------------------------------------------------+
| O O O OO O O OO O |
580000 |-+ O |
| O O OO O O O O O |
575000 |-O OO O |
| O O |
570000 |-+ O O O O OO O |
| OO |
565000 |-+ O |
| .+. +. .+.++.+.+.+.++.+.+.+. |
560000 |-+ +.+.+ +.+ +.+ + |
| : |
555000 |.+.++.+. .+.+ |
| + |
550000 +------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation
Thanks,
Oliver Sang
View attachment "config-5.11.0-04580-ga10787e6d58c" of type "text/plain" (172553 bytes)
View attachment "job-script" of type "text/plain" (7803 bytes)
View attachment "job.yaml" of type "text/plain" (5143 bytes)
View attachment "reproduce" of type "text/plain" (337 bytes)
Powered by blists - more mailing lists