[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20190620131815.GR7221@shao2-debian>
Date: Thu, 20 Jun 2019 21:18:15 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Feng Tang <feng.tang@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>, lkp@...org
Subject: [tcp] 0b7d7f6b22: netperf.Throughput_Mbps 9.7% improvement
Greeting,
FYI, we noticed a 9.7% improvement of netperf.Throughput_Mbps due to commit:
commit: 0b7d7f6b22084a3156f267c85303908a8f4c9a08 ("tcp: add tcp_tx_skb_cache sysctl")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: netperf
on test machine: 40 threads Skylake-SP with 64G memory
with following parameters:
ip: ipv4
runtime: 900s
nr_threads: 25%
cluster: cs-localhost
test: TCP_STREAM
cpufreq_governor: performance
test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
cs-localhost/gcc-7/performance/ipv4/x86_64-rhel-7.6/25%/debian-x86_64-2019-05-14.cgz/900s/lkp-skl-sp2/TCP_STREAM/netperf
commit:
ede61ca474 ("tcp: add tcp_rx_skb_cache sysctl")
0b7d7f6b22 ("tcp: add tcp_tx_skb_cache sysctl")
ede61ca474a0348b 0b7d7f6b22084a3156f267c8530
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:4 25% 1:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
%stddev %change %stddev
\ | \
26597 ± 2% +9.7% 29165 netperf.Throughput_Mbps
265970 ± 2% +9.7% 291650 netperf.Throughput_total_Mbps
534.72 ± 2% +8.3% 579.35 netperf.time.user_time
1.826e+09 ± 2% +9.7% 2.003e+09 netperf.workload
736.25 ± 2% -2.2% 720.06 boot-time.idle
1380 -2.8% 1341 turbostat.Avg_MHz
5.014e+08 ± 9% -63.0% 1.853e+08 ± 6% cpuidle.POLL.time
1.086e+08 ± 9% -59.6% 43889116 ± 9% cpuidle.POLL.usage
1467418 -7.3% 1359567 vmstat.system.cs
82541 -1.4% 81414 vmstat.system.in
1680378 ± 39% +56.8% 2634500 ± 24% sched_debug.cfs_rq:/.MIN_vruntime.max
1680378 ± 39% +56.8% 2634500 ± 24% sched_debug.cfs_rq:/.max_vruntime.max
48058835 ± 5% -14.2% 41219240 ± 8% sched_debug.cpu.nr_switches.max
9.136e+08 ± 2% +9.5% 1.001e+09 proc-vmstat.numa_hit
9.136e+08 ± 2% +9.5% 1.001e+09 proc-vmstat.numa_local
7.304e+09 ± 2% +9.6% 8.003e+09 proc-vmstat.pgalloc_normal
7.304e+09 ± 2% +9.6% 8.003e+09 proc-vmstat.pgfree
22975 ± 5% -14.0% 19757 ± 5% softirqs.CPU1.RCU
4136190 ± 91% +399.2% 20646386 ± 19% softirqs.CPU12.NET_RX
19174130 ± 27% -50.9% 9409003 ± 42% softirqs.CPU16.NET_RX
8338169 ± 80% -70.6% 2453306 ± 73% softirqs.CPU19.NET_RX
29346983 ± 13% -71.5% 8362822 ± 43% softirqs.CPU32.NET_RX
58369 ± 88% +143.5% 142132 ± 6% softirqs.CPU32.SCHED
20633958 ± 28% -55.8% 9117560 ± 82% softirqs.CPU34.NET_RX
49.70 +5.8% 52.60 perf-stat.i.MPKI
1.081e+09 ± 2% +4.1% 1.126e+09 perf-stat.i.cache-references
1471912 -7.4% 1363353 perf-stat.i.context-switches
5.582e+10 -2.8% 5.425e+10 perf-stat.i.cpu-cycles
0.04 +0.0 0.05 perf-stat.i.dTLB-load-miss-rate%
2906127 ± 2% +5.3% 3061023 perf-stat.i.dTLB-load-misses
9690231 -1.6% 9530679 perf-stat.i.iTLB-loads
49.74 +5.8% 52.62 perf-stat.overall.MPKI
0.04 +0.0 0.05 perf-stat.overall.dTLB-load-miss-rate%
10716 -10.2% 9618 perf-stat.overall.path-length
1.08e+09 ± 2% +4.1% 1.124e+09 perf-stat.ps.cache-references
1470272 -7.4% 1361828 perf-stat.ps.context-switches
5.576e+10 -2.8% 5.419e+10 perf-stat.ps.cpu-cycles
2902892 ± 2% +5.3% 3057581 perf-stat.ps.dTLB-load-misses
9679435 -1.6% 9519990 perf-stat.ps.iTLB-loads
3.50 ± 82% +4000.0% 143.50 ±156% interrupts.42:PCI-MSI.54001672-edge.i40e-eth0-TxRx-7
7.50 ± 50% +5220.0% 399.00 ± 63% interrupts.43:PCI-MSI.54001673-edge.i40e-eth0-TxRx-8
44.25 ±164% +609.6% 314.00 ± 57% interrupts.47:PCI-MSI.54001677-edge.i40e-eth0-TxRx-12
414.25 ± 56% -97.0% 12.25 ± 72% interrupts.60:PCI-MSI.54001690-edge.i40e-eth0-TxRx-25
32425 ± 33% -52.5% 15391 ± 48% interrupts.CPU0.RES:Rescheduling_interrupts
30129 ± 16% -44.0% 16880 ± 22% interrupts.CPU1.RES:Rescheduling_interrupts
44.25 ±164% +608.5% 313.50 ± 57% interrupts.CPU12.47:PCI-MSI.54001677-edge.i40e-eth0-TxRx-12
3605 ± 40% +57.9% 5693 ± 29% interrupts.CPU12.NMI:Non-maskable_interrupts
3605 ± 40% +57.9% 5693 ± 29% interrupts.CPU12.PMI:Performance_monitoring_interrupts
6197 ± 30% -38.8% 3789 ± 25% interrupts.CPU16.NMI:Non-maskable_interrupts
6197 ± 30% -38.8% 3789 ± 25% interrupts.CPU16.PMI:Performance_monitoring_interrupts
39902 ± 64% -83.6% 6547 ± 67% interrupts.CPU19.RES:Rescheduling_interrupts
35730 ± 30% -54.5% 16242 ± 35% interrupts.CPU21.RES:Rescheduling_interrupts
49398 ± 34% -54.0% 22743 ± 81% interrupts.CPU22.RES:Rescheduling_interrupts
64829 ± 70% -75.0% 16198 ± 86% interrupts.CPU24.RES:Rescheduling_interrupts
413.75 ± 56% -97.3% 11.25 ± 79% interrupts.CPU25.60:PCI-MSI.54001690-edge.i40e-eth0-TxRx-25
6245 ± 25% -45.1% 3426 ± 30% interrupts.CPU26.NMI:Non-maskable_interrupts
6245 ± 25% -45.1% 3426 ± 30% interrupts.CPU26.PMI:Performance_monitoring_interrupts
40671 ± 45% -71.7% 11505 ± 44% interrupts.CPU33.RES:Rescheduling_interrupts
44964 ± 44% -82.6% 7818 ± 58% interrupts.CPU35.RES:Rescheduling_interrupts
71571 ± 60% -78.1% 15666 ± 54% interrupts.CPU37.RES:Rescheduling_interrupts
6.75 ± 53% +5807.4% 398.75 ± 63% interrupts.CPU8.43:PCI-MSI.54001673-edge.i40e-eth0-TxRx-8
41736 ± 60% -60.4% 16544 ± 30% interrupts.CPU8.RES:Rescheduling_interrupts
1722165 ± 6% -58.9% 707890 ± 6% interrupts.RES:Rescheduling_interrupts
50.26 ± 38% -41.5 8.72 ±103% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
50.17 ± 38% -41.5 8.71 ±103% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
30.70 ± 40% -26.2 4.50 ±107% perf-profile.calltrace.cycles-pp.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
30.65 ± 40% -26.2 4.48 ±107% perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.53 ± 13% -4.5 0.00 perf-profile.calltrace.cycles-pp.sk_stream_alloc_skb.tcp_sendmsg_locked.tcp_sendmsg.sock_sendmsg.__sys_sendto
2.17 ± 15% -1.3 0.88 ± 9% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
0.00 +0.6 0.61 ± 10% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.send.send_omni_inner.send_tcp_stream.main
3.41 ± 9% +0.7 4.07 ± 8% perf-profile.calltrace.cycles-pp.ip_output.__ip_queue_xmit.__tcp_transmit_skb.tcp_recvmsg.inet_recvmsg
3.11 ± 9% +0.7 3.81 ± 8% perf-profile.calltrace.cycles-pp.ip_finish_output2.ip_output.__ip_queue_xmit.__tcp_transmit_skb.tcp_recvmsg
0.00 +0.8 0.76 ± 12% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.send.send_omni_inner.send_tcp_stream.main
0.83 ± 19% +0.9 1.69 ± 10% perf-profile.calltrace.cycles-pp._raw_spin_lock_bh.lock_sock_nested.tcp_sendmsg.sock_sendmsg.__sys_sendto
0.00 +0.9 0.87 ± 17% perf-profile.calltrace.cycles-pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv
0.95 ± 19% +0.9 1.83 ± 10% perf-profile.calltrace.cycles-pp.lock_sock_nested.tcp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto
0.00 +0.9 0.92 ± 17% perf-profile.calltrace.cycles-pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu
0.61 ± 57% +1.0 1.58 ± 10% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_bh.lock_sock_nested.tcp_sendmsg.sock_sendmsg
0.12 ±173% +2.2 2.37 ± 17% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_one_page.__free_pages_ok.skb_release_data
0.28 ±100% +2.4 2.69 ± 16% perf-profile.calltrace.cycles-pp._raw_spin_lock.free_one_page.__free_pages_ok.skb_release_data.__kfree_skb
7.70 ± 8% +2.8 10.49 ± 8% perf-profile.calltrace.cycles-pp.__release_sock.release_sock.tcp_sendmsg.sock_sendmsg.__sys_sendto
7.87 ± 8% +2.8 10.67 ± 8% perf-profile.calltrace.cycles-pp.release_sock.tcp_sendmsg.sock_sendmsg.__sys_sendto.__x64_sys_sendto
0.32 ±100% +3.0 3.34 ± 9% perf-profile.calltrace.cycles-pp.free_one_page.__free_pages_ok.skb_release_data.__kfree_skb.tcp_clean_rtx_queue
4.74 ± 8% +3.1 7.83 ± 9% perf-profile.calltrace.cycles-pp.tcp_v4_do_rcv.__release_sock.release_sock.tcp_sendmsg.sock_sendmsg
4.64 ± 8% +3.1 7.75 ± 9% perf-profile.calltrace.cycles-pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock.tcp_sendmsg
0.46 ± 59% +3.3 3.77 ± 10% perf-profile.calltrace.cycles-pp.__free_pages_ok.skb_release_data.__kfree_skb.tcp_clean_rtx_queue.tcp_ack
1.35 ± 22% +3.5 4.83 ± 8% perf-profile.calltrace.cycles-pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock
1.16 ± 21% +3.5 4.64 ± 8% perf-profile.calltrace.cycles-pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__release_sock
0.63 ± 58% +4.3 4.90 ± 8% perf-profile.calltrace.cycles-pp.skb_release_data.__kfree_skb.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established
0.63 ± 58% +4.4 5.03 ± 8% perf-profile.calltrace.cycles-pp.__kfree_skb.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv
2.58 ±173% +14.0 16.61 ± 19% perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recv
2.59 ±173% +14.0 16.63 ± 19% perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recv.recv_omni
2.67 ±173% +14.4 17.12 ± 19% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recv.recv_omni.process_requests
2.68 ±173% +14.5 17.15 ± 19% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.recv.recv_omni.process_requests.spawn_child
2.81 ±173% +15.1 17.93 ± 19% perf-profile.calltrace.cycles-pp.recv.recv_omni.process_requests.spawn_child.accept_connection
2.85 ±173% +15.3 18.16 ± 19% perf-profile.calltrace.cycles-pp.recv_omni.process_requests.spawn_child.accept_connection.accept_connections
2.85 ±173% +15.3 18.16 ± 19% perf-profile.calltrace.cycles-pp.accept_connections.main.__libc_start_main
2.85 ±173% +15.3 18.16 ± 19% perf-profile.calltrace.cycles-pp.accept_connection.accept_connections.main.__libc_start_main
2.85 ±173% +15.3 18.16 ± 19% perf-profile.calltrace.cycles-pp.spawn_child.accept_connection.accept_connections.main.__libc_start_main
2.85 ±173% +15.3 18.16 ± 19% perf-profile.calltrace.cycles-pp.process_requests.spawn_child.accept_connection.accept_connections.main
5.40 ±173% +26.5 31.88 ± 12% perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.send
5.41 ±173% +26.5 31.95 ± 12% perf-profile.calltrace.cycles-pp.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.send.send_omni_inner
5.62 ±173% +27.6 33.25 ± 12% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.send.send_omni_inner.send_tcp_stream
5.63 ±173% +27.7 33.31 ± 12% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.send.send_omni_inner.send_tcp_stream.main
5.86 ±173% +28.9 34.77 ± 12% perf-profile.calltrace.cycles-pp.send.send_omni_inner.send_tcp_stream.main.__libc_start_main
5.92 ±173% +29.2 35.17 ± 12% perf-profile.calltrace.cycles-pp.send_omni_inner.send_tcp_stream.main.__libc_start_main
5.92 ±173% +29.3 35.18 ± 12% perf-profile.calltrace.cycles-pp.send_tcp_stream.main.__libc_start_main
8.78 ±173% +44.6 53.34 ± 14% perf-profile.calltrace.cycles-pp.__libc_start_main
8.78 ±173% +44.6 53.34 ± 14% perf-profile.calltrace.cycles-pp.main.__libc_start_main
4.53 ± 13% -4.1 0.39 ± 9% perf-profile.children.cycles-pp.sk_stream_alloc_skb
2.23 ± 14% -1.3 0.91 ± 10% perf-profile.children.cycles-pp.poll_idle
0.27 ± 11% -0.1 0.19 ± 11% perf-profile.children.cycles-pp.__skb_clone
0.17 ± 21% -0.0 0.12 ± 6% perf-profile.children.cycles-pp.finish_task_switch
0.07 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.__sock_wfree
0.07 ± 6% +0.0 0.10 ± 11% perf-profile.children.cycles-pp.kmem_cache_free
0.11 ± 13% +0.1 0.16 ± 6% perf-profile.children.cycles-pp.__might_fault
0.16 ± 11% +0.1 0.21 ± 6% perf-profile.children.cycles-pp.___might_sleep
0.08 ± 5% +0.1 0.14 ± 11% perf-profile.children.cycles-pp.kmem_cache_alloc_node
0.09 ± 8% +0.1 0.16 ± 9% perf-profile.children.cycles-pp.__kmalloc_node_track_caller
0.00 +0.1 0.07 ± 16% perf-profile.children.cycles-pp.kfree_skbmem
0.10 ± 8% +0.1 0.17 ± 6% perf-profile.children.cycles-pp.__kmalloc_reserve
0.01 ±173% +0.1 0.09 ± 13% perf-profile.children.cycles-pp.recv_data
0.00 +0.1 0.08 ± 10% perf-profile.children.cycles-pp.raw_local_deliver
0.00 +0.1 0.09 ± 8% perf-profile.children.cycles-pp.kfree
0.01 ±173% +0.1 0.11 ± 14% perf-profile.children.cycles-pp.send_data
0.46 ± 7% +0.2 0.68 ± 8% perf-profile.children.cycles-pp.__alloc_skb
1.41 ± 7% +0.8 2.25 ± 8% perf-profile.children.cycles-pp._raw_spin_lock_bh
1.38 ± 7% +0.9 2.25 ± 9% perf-profile.children.cycles-pp.lock_sock_nested
8.37 ± 8% +2.8 11.21 ± 8% perf-profile.children.cycles-pp.release_sock
8.00 ± 8% +2.8 10.84 ± 8% perf-profile.children.cycles-pp.__release_sock
9.68 ± 8% +3.6 13.33 ± 9% perf-profile.children.cycles-pp.tcp_v4_do_rcv
9.53 ± 8% +3.7 13.19 ± 9% perf-profile.children.cycles-pp.tcp_rcv_established
1.22 ± 12% +4.1 5.33 ± 9% perf-profile.children.cycles-pp.skb_release_data
1.84 ± 11% +4.2 6.00 ± 9% perf-profile.children.cycles-pp.__kfree_skb
1.81 ± 9% +4.2 6.02 ± 9% perf-profile.children.cycles-pp.tcp_ack
1.53 ± 10% +4.2 5.77 ± 9% perf-profile.children.cycles-pp.tcp_clean_rtx_queue
2.82 ±173% +15.2 17.99 ± 19% perf-profile.children.cycles-pp.recv
2.85 ±173% +15.3 18.16 ± 19% perf-profile.children.cycles-pp.accept_connections
2.85 ±173% +15.3 18.16 ± 19% perf-profile.children.cycles-pp.accept_connection
2.85 ±173% +15.3 18.16 ± 19% perf-profile.children.cycles-pp.spawn_child
2.85 ±173% +15.3 18.16 ± 19% perf-profile.children.cycles-pp.process_requests
2.85 ±173% +15.3 18.16 ± 19% perf-profile.children.cycles-pp.recv_omni
5.89 ±173% +29.0 34.91 ± 12% perf-profile.children.cycles-pp.send
5.92 ±173% +29.2 35.17 ± 12% perf-profile.children.cycles-pp.send_omni_inner
5.92 ±173% +29.3 35.18 ± 12% perf-profile.children.cycles-pp.send_tcp_stream
8.78 ±173% +44.6 53.34 ± 14% perf-profile.children.cycles-pp.__libc_start_main
8.78 ±173% +44.6 53.36 ± 14% perf-profile.children.cycles-pp.main
2.20 ± 15% -1.3 0.89 ± 10% perf-profile.self.cycles-pp.poll_idle
0.36 ± 5% -0.1 0.27 ± 9% perf-profile.self.cycles-pp.tcp_clean_rtx_queue
0.24 ± 11% -0.1 0.17 ± 10% perf-profile.self.cycles-pp.__skb_clone
0.06 ± 13% +0.0 0.10 ± 13% perf-profile.self.cycles-pp.__kmalloc_node_track_caller
0.06 ± 7% +0.0 0.10 ± 11% perf-profile.self.cycles-pp.kmem_cache_alloc_node
0.16 ± 11% +0.0 0.21 ± 7% perf-profile.self.cycles-pp.___might_sleep
0.01 ±173% +0.1 0.07 ± 12% perf-profile.self.cycles-pp.recv_data
0.00 +0.1 0.06 ± 17% perf-profile.self.cycles-pp.send_data
0.00 +0.1 0.07 ± 13% perf-profile.self.cycles-pp.kfree_skbmem
0.01 ±173% +0.1 0.08 ± 19% perf-profile.self.cycles-pp.recv
0.07 ± 15% +0.1 0.14 ± 5% perf-profile.self.cycles-pp.__alloc_skb
0.00 +0.1 0.08 ± 10% perf-profile.self.cycles-pp.raw_local_deliver
0.02 ±173% +0.1 0.11 ± 20% perf-profile.self.cycles-pp.recv_omni
0.00 +0.1 0.09 ± 11% perf-profile.self.cycles-pp.kfree
0.03 ±173% +0.1 0.17 ± 14% perf-profile.self.cycles-pp.send
0.03 ±173% +0.2 0.19 ± 12% perf-profile.self.cycles-pp.send_omni_inner
0.49 ± 10% +0.7 1.21 ± 10% perf-profile.self.cycles-pp.skb_release_data
netperf.Throughput_Mbps
30000 O-OO-O-O-OO-O-O-OO-O-O-O--O-O-O--O----O---O--O-O-O-OO---O-OO--------+
| ++.+.+. .+. +.+O.+ +O.+.O O+.O.+.O+.+.+. +.O.+.+ +.|
25000 +-+ ++ +.++ : : : : : + + +.+.+.+ |
| : : : : : : : |
| : : : : : : : |
20000 +-+ : : : : : : |
|: : : : : : : |
15000 +-+ : : : : : : |
|: : : : : :: |
10000 +-+ : : : : :: |
|: : : : : :: |
| : : : |
5000 +-+ : : : |
| : : : |
0 +-+-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-5.2.0-rc3-00260-g0b7d7f6" of type "text/plain" (196402 bytes)
View attachment "job-script" of type "text/plain" (7832 bytes)
View attachment "job.yaml" of type "text/plain" (5350 bytes)
View attachment "reproduce" of type "text/plain" (815 bytes)
Powered by blists - more mailing lists