[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210317133214.GA28839@xsang-OptiPlex-9020>
Date: Wed, 17 Mar 2021 21:32:14 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
Song Liu <songliubraving@...com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
zhengjun.xing@...el.com
Subject: [bpf] a9ed15dae0: netperf.Throughput_tps 3.9% improvement
Greeting,
FYI, we noticed a 3.9% improvement of netperf.Throughput_tps due to commit:
commit: a9ed15dae0755a0368735e0556a462d8519bdb05 ("bpf: Split cgroup_bpf_enabled per attach type")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: netperf
on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:
ip: ipv4
runtime: 300s
nr_threads: 25%
cluster: cs-localhost
test: UDP_RR
cpufreq_governor: performance
ucode: 0x5003006
test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase/ucode:
cs-localhost/gcc-9/performance/ipv4/x86_64-rhel-8.3/25%/debian-10.4-x86_64-20200603.cgz/300s/lkp-csl-2sp9/UDP_RR/netperf/0x5003006
commit:
20f2505fb4 ("bpf: Try to avoid kzalloc in cgroup/{s,g}etsockopt")
a9ed15dae0 ("bpf: Split cgroup_bpf_enabled per attach type")
20f2505fb436cfa6 a9ed15dae0755a0368735e0556a
---------------- ---------------------------
%stddev %change %stddev
\ | \
2049344 +3.9% 2129386 netperf.Throughput_total_tps
93152 +3.9% 96790 netperf.Throughput_tps
9796 ± 2% +7.2% 10501 netperf.time.involuntary_context_switches
6.147e+08 +3.9% 6.387e+08 netperf.time.voluntary_context_switches
6.148e+08 +3.9% 6.388e+08 netperf.workload
10.42 -1.1 9.36 turbostat.C1%
8089495 +3.9% 8405213 vmstat.system.cs
2.774e+09 -10.2% 2.492e+09 cpuidle.C1.time
1.754e+09 +10.7% 1.941e+09 cpuidle.POLL.time
5.579e+08 +17.4% 6.552e+08 cpuidle.POLL.usage
3.00 ±223% +686.6% 23.56 ±105% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
15.59 ± 74% +109.0% 32.59 ± 12% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork
0.01 ± 71% +1483.3% 0.11 ± 56% perf-sched.sch_delay.max.ms.preempt_schedule_common._cond_resched.stop_one_cpu.affine_move_task.__set_cpus_allowed_ptr
1182 ± 73% +82.1% 2152 ± 6% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork
0.01 ± 71% +65.8% 0.01 ± 4% perf-sched.total_sch_delay.average.ms
833.13 ± 82% +100.0% 1666 ± 28% perf-sched.wait_and_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
833.12 ± 82% +100.0% 1666 ± 28% perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
3550136 ± 7% +17.9% 4185038 ± 5% softirqs.CPU0.NET_RX
7292845 ± 8% +11.0% 8091559 ± 4% softirqs.CPU11.NET_RX
16054 ± 11% -13.2% 13942 ± 5% softirqs.CPU16.RCU
15910 ± 9% -11.6% 14064 ± 6% softirqs.CPU22.RCU
15262 ± 17% -15.3% 12920 ± 4% softirqs.CPU3.RCU
15230 ± 13% -16.3% 12751 ± 5% softirqs.CPU49.RCU
14806 ± 13% -13.5% 12811 ± 3% softirqs.CPU50.RCU
14895 ± 14% -15.5% 12580 ± 6% softirqs.CPU55.RCU
14851 ± 12% -13.7% 12810 ± 4% softirqs.CPU59.RCU
5330188 ± 3% +8.8% 5801721 ± 5% softirqs.CPU63.NET_RX
5480730 ± 8% +15.0% 6302697 ± 4% softirqs.CPU80.NET_RX
731732 ± 3% +22.3% 894653 ± 5% interrupts.CAL:Function_call_interrupts
6652 ± 31% +58.2% 10526 ± 29% interrupts.CPU25.CAL:Function_call_interrupts
9123 ± 24% +57.8% 14392 ± 9% interrupts.CPU3.CAL:Function_call_interrupts
4532 ± 10% +19.6% 5418 ± 7% interrupts.CPU34.NMI:Non-maskable_interrupts
4532 ± 10% +19.6% 5418 ± 7% interrupts.CPU34.PMI:Performance_monitoring_interrupts
12587 ± 4% -18.6% 10245 ± 6% interrupts.CPU36.RES:Rescheduling_interrupts
3988 ± 30% +43.1% 5706 ± 9% interrupts.CPU37.NMI:Non-maskable_interrupts
3988 ± 30% +43.1% 5706 ± 9% interrupts.CPU37.PMI:Performance_monitoring_interrupts
4298 ± 20% +23.8% 5321 ± 10% interrupts.CPU38.NMI:Non-maskable_interrupts
4298 ± 20% +23.8% 5321 ± 10% interrupts.CPU38.PMI:Performance_monitoring_interrupts
2638 ± 29% +83.9% 4851 ± 20% interrupts.CPU43.NMI:Non-maskable_interrupts
2638 ± 29% +83.9% 4851 ± 20% interrupts.CPU43.PMI:Performance_monitoring_interrupts
9514 ± 19% +37.5% 13080 ± 12% interrupts.CPU47.CAL:Function_call_interrupts
8019 ± 25% +47.6% 11836 ± 34% interrupts.CPU50.CAL:Function_call_interrupts
3317 ± 37% +54.5% 5125 ± 9% interrupts.CPU63.NMI:Non-maskable_interrupts
3317 ± 37% +54.5% 5125 ± 9% interrupts.CPU63.PMI:Performance_monitoring_interrupts
6959 ± 46% +86.9% 13006 ± 30% interrupts.CPU73.CAL:Function_call_interrupts
19.24 +3.9% 20.00 perf-stat.i.MPKI
2.521e+08 +1.0% 2.547e+08 perf-stat.i.branch-misses
10735680 ± 8% +54.9% 16625059 ± 29% perf-stat.i.cache-misses
1.401e+09 +4.4% 1.462e+09 perf-stat.i.cache-references
8146058 +3.9% 8463745 perf-stat.i.context-switches
1.47 +1.5% 1.49 perf-stat.i.cpi
1.072e+11 +1.9% 1.093e+11 perf-stat.i.cpu-cycles
14335 ± 11% -35.3% 9272 ± 31% perf-stat.i.cycles-between-cache-misses
83.51 +0.9 84.43 perf-stat.i.iTLB-load-miss-rate%
38402582 -6.7% 35845533 perf-stat.i.iTLB-loads
0.69 -1.4% 0.68 perf-stat.i.ipc
1.22 +1.9% 1.24 perf-stat.i.metric.GHz
1939336 ± 3% +11.7% 2166223 ± 6% perf-stat.i.node-store-misses
18.96 +3.9% 19.69 perf-stat.overall.MPKI
1.45 +1.4% 1.47 perf-stat.overall.cpi
10050 ± 8% -29.4% 7096 ± 25% perf-stat.overall.cycles-between-cache-misses
83.62 +0.9 84.55 perf-stat.overall.iTLB-load-miss-rate%
0.69 -1.4% 0.68 perf-stat.overall.ipc
36166 -3.3% 34988 perf-stat.overall.path-length
2.512e+08 +1.0% 2.538e+08 perf-stat.ps.branch-misses
10702292 ± 8% +54.9% 16573593 ± 29% perf-stat.ps.cache-misses
1.396e+09 +4.4% 1.457e+09 perf-stat.ps.cache-references
8118168 +3.9% 8434896 perf-stat.ps.context-switches
1.069e+11 +1.9% 1.089e+11 perf-stat.ps.cpu-cycles
38273181 -6.7% 35723998 perf-stat.ps.iTLB-loads
1932777 ± 3% +11.7% 2159007 ± 6% perf-stat.ps.node-store-misses
14.95 -0.9 14.04 ± 6% perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
14.81 -0.9 13.92 ± 6% perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.33 -0.9 12.45 ± 6% perf-profile.calltrace.cycles-pp.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.28 -0.9 12.40 ± 6% perf-profile.calltrace.cycles-pp.udp_recvmsg.inet_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
0.72 ± 2% -0.1 0.67 ± 7% perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.__kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
14.97 -0.9 14.06 ± 6% perf-profile.children.cycles-pp.__x64_sys_recvfrom
14.82 -0.9 13.93 ± 6% perf-profile.children.cycles-pp.__sys_recvfrom
13.34 -0.9 12.45 ± 6% perf-profile.children.cycles-pp.inet_recvmsg
13.29 -0.9 12.41 ± 6% perf-profile.children.cycles-pp.udp_recvmsg
1.01 -0.6 0.43 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_bh
0.27 ± 4% -0.1 0.18 ± 5% perf-profile.children.cycles-pp.__netif_receive_skb_core
0.55 ± 2% -0.1 0.48 ± 7% perf-profile.children.cycles-pp.__slab_free
0.44 ± 2% -0.1 0.38 ± 9% perf-profile.children.cycles-pp.skb_set_owner_w
0.74 ± 2% -0.1 0.68 ± 7% perf-profile.children.cycles-pp.__kmalloc_node_track_caller
0.29 -0.1 0.24 ± 5% perf-profile.children.cycles-pp.___perf_sw_event
0.21 -0.0 0.16 ± 8% perf-profile.children.cycles-pp.migrate_enable
0.25 -0.0 0.20 ± 9% perf-profile.children.cycles-pp.__might_sleep
0.12 ± 5% -0.0 0.08 ± 10% perf-profile.children.cycles-pp._cond_resched
0.07 ± 5% -0.0 0.04 ± 71% perf-profile.children.cycles-pp.raw_local_deliver
0.24 ± 3% -0.0 0.20 ± 6% perf-profile.children.cycles-pp.security_socket_recvmsg
0.18 ± 4% -0.0 0.15 ± 7% perf-profile.children.cycles-pp.__ip_finish_output
0.12 ± 4% -0.0 0.09 ± 10% perf-profile.children.cycles-pp.rcu_read_unlock_strict
0.26 ± 2% -0.0 0.23 ± 7% perf-profile.children.cycles-pp.sock_recvmsg
0.24 ± 2% -0.0 0.22 ± 4% perf-profile.children.cycles-pp.ipv4_mtu
0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-pp.rcu_note_context_switch
0.28 ± 2% +0.1 0.34 ± 4% perf-profile.children.cycles-pp.skb_release_data
0.39 ± 3% +0.1 0.53 ± 7% perf-profile.children.cycles-pp.__cgroup_bpf_run_filter_skb
0.27 ± 4% +0.2 0.43 ± 7% perf-profile.children.cycles-pp.ip_finish_output
0.99 -0.6 0.42 ± 9% perf-profile.self.cycles-pp._raw_spin_lock_bh
0.27 ± 3% -0.1 0.18 ± 7% perf-profile.self.cycles-pp.__netif_receive_skb_core
0.22 ± 4% -0.1 0.14 ± 8% perf-profile.self.cycles-pp.validate_xmit_skb
0.59 ± 2% -0.1 0.52 ± 5% perf-profile.self.cycles-pp.udp_sendmsg
0.22 ± 3% -0.1 0.15 ± 9% perf-profile.self.cycles-pp.__local_bh_enable_ip
0.54 ± 2% -0.1 0.47 ± 7% perf-profile.self.cycles-pp.__slab_free
0.43 ± 2% -0.1 0.37 ± 9% perf-profile.self.cycles-pp.skb_set_owner_w
0.67 ± 2% -0.1 0.61 ± 7% perf-profile.self.cycles-pp.__skb_wait_for_more_packets
0.29 ± 3% -0.0 0.25 ± 6% perf-profile.self.cycles-pp.__kmalloc_node_track_caller
0.19 ± 3% -0.0 0.15 ± 6% perf-profile.self.cycles-pp.migrate_enable
0.07 ± 11% -0.0 0.03 ± 70% perf-profile.self.cycles-pp.raw_local_deliver
0.21 ± 3% -0.0 0.18 ± 8% perf-profile.self.cycles-pp.__might_sleep
0.13 ± 11% -0.0 0.10 ± 10% perf-profile.self.cycles-pp.udp_unicast_rcv_skb
0.21 ± 4% -0.0 0.18 ± 6% perf-profile.self.cycles-pp.___perf_sw_event
0.09 ± 5% -0.0 0.07 ± 10% perf-profile.self.cycles-pp.rcu_read_unlock_strict
0.23 ± 2% -0.0 0.21 ± 4% perf-profile.self.cycles-pp.ipv4_mtu
0.12 ± 4% -0.0 0.10 ± 9% perf-profile.self.cycles-pp.migrate_disable
0.08 ± 6% -0.0 0.06 ± 11% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.06 ± 7% +0.0 0.10 ± 8% perf-profile.self.cycles-pp.asm_call_sysvec_on_stack
0.27 ± 3% +0.0 0.31 ± 10% perf-profile.self.cycles-pp.__udp_enqueue_schedule_skb
0.28 ± 2% +0.1 0.34 ± 4% perf-profile.self.cycles-pp.skb_release_data
0.01 ±223% +0.1 0.07 ± 6% perf-profile.self.cycles-pp.__ip_select_ident
netperf.Throughput_tps
97500 +-------------------------------------------------------------------+
97000 |-+ |
| O |
96500 |-+ O O |
96000 |-+ |
| |
95500 |-+ |
95000 |-+ |
94500 |-+ |
| |
94000 |-+ |
93500 |-+ ......+.............+............ |
| ...+...... +.............|
93000 |-+ ....... |
92500 +-------------------------------------------------------------------+
netperf.Throughput_total_tps
2.16e+06 +----------------------------------------------------------------+
| |
2.14e+06 |-+ O |
| O |
2.12e+06 |-+ O O |
| |
2.1e+06 |-+ |
| |
2.08e+06 |-+ |
| |
2.06e+06 |-+ ......+............ |
| ......+...... +............+............|
2.04e+06 |...... |
| |
2.02e+06 +----------------------------------------------------------------+
netperf.workload
6.45e+08 +----------------------------------------------------------------+
| O |
6.4e+08 |-+ O |
| O |
6.35e+08 |-+ O |
| |
6.3e+08 |-+ |
| |
6.25e+08 |-+ |
| |
6.2e+08 |-+ |
| ......+............+............ |
6.15e+08 |-+ ......+...... +............|
|...... |
6.1e+08 +----------------------------------------------------------------+
netperf.time.voluntary_context_switches
6.45e+08 +----------------------------------------------------------------+
| O |
6.4e+08 |-+ |
| O O |
6.35e+08 |-+ O |
| |
6.3e+08 |-+ |
| |
6.25e+08 |-+ |
| |
6.2e+08 |-+ |
| ......+............+............ |
6.15e+08 |-+ ......+...... +............|
|...... |
6.1e+08 +----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation
Thanks,
Oliver Sang
View attachment "config-5.11.0-rc4-00516-ga9ed15dae075" of type "text/plain" (172414 bytes)
View attachment "job-script" of type "text/plain" (8237 bytes)
View attachment "job.yaml" of type "text/plain" (5525 bytes)
View attachment "reproduce" of type "text/plain" (1375 bytes)
Powered by blists - more mailing lists