[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20171208025651.GK21779@yexl-desktop>
Date: Fri, 8 Dec 2017 10:56:51 +0800
From: kernel test robot <xiaolong.ye@...el.com>
To: Omer Peleg <omer@...technion.ac.il>
Cc: David Woodhouse <David.Woodhouse@...el.com>,
Adam Morrison <mad@...technion.ac.il>,
Shaohua Li <shli@...com>, Ben Serebrin <serebrin@...gle.com>,
LKML <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>, lkp@...org
Subject: [lkp-robot] [iommu/vt] 22e2f9fa63: netperf.Throughput_tps 302.5%
improvement
Greeting,
FYI, we noticed a 302.5% improvement of netperf.Throughput_tps due to commit:
commit: 22e2f9fa63b092923873fc8a52955151f4d83274 ("iommu/vt-d: Use per-cpu IOVA caching")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: netperf
on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory
with following parameters:
ip: ipv4
runtime: 310s
nr_threads: 25%
cluster: cs-lkp-hsw-ep5
test: TCP_CRR
cpufreq_governor: performance
test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance.
test-url: http://www.netperf.org/netperf/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
cs-lkp-hsw-ep5/gcc-5/performance/ipv4/x86_64-rhel-7.2/25%/debian-x86_64-2016-08-31.cgz/310s/lkp-hsw-ep6/TCP_CRR/netperf
commit:
9257b4a206 ("iommu/iova: introduce per-cpu caching to iova allocation")
22e2f9fa63 ("iommu/vt-d: Use per-cpu IOVA caching")
9257b4a206fc0229 22e2f9fa63b092923873fc8a52
---------------- --------------------------
%stddev %change %stddev
\ | \
1137 ± 7% +302.5% 4577 ± 3% netperf.Throughput_tps
84839 ± 14% -86.2% 11734 ± 2% netperf.time.involuntary_context_switches
542.00 ± 13% -29.6% 381.50 ± 3% netperf.time.percent_of_cpu_this_job_got
1692 ± 13% -32.9% 1135 ± 2% netperf.time.system_time
29.02 ± 6% +152.7% 73.31 netperf.time.user_time
11362386 ± 6% +266.1% 41594215 ± 3% netperf.time.voluntary_context_switches
11892 ± 25% -50.0% 5944 ± 62% numa-meminfo.node0.Shmem
211004 ± 8% +16.8% 246387 ± 9% meminfo.Committed_AS
16896 ± 6% -14.6% 14426 meminfo.Shmem
1.68 ± 13% -0.8 0.89 mpstat.cpu.soft%
0.14 ± 13% +0.2 0.37 mpstat.cpu.usr%
69000 ± 14% +279.2% 261654 ± 2% vmstat.system.cs
188028 ± 10% +206.9% 577018 ± 2% vmstat.system.in
23198861 ± 7% +262.7% 84148028 ± 3% softirqs.NET_RX
563581 +11.1% 625948 ± 2% softirqs.RCU
646307 ± 7% -25.7% 480046 ± 3% softirqs.SCHED
419484 ± 11% +39.1% 583707 ± 7% numa-numastat.node0.local_node
423206 ± 11% +38.8% 587394 ± 7% numa-numastat.node0.numa_hit
411160 ± 13% +47.2% 605296 ± 11% numa-numastat.node1.local_node
418501 ± 13% +46.4% 612696 ± 11% numa-numastat.node1.numa_hit
946.00 ± 8% +18.6% 1122 ± 4% slabinfo.blkdev_requests.active_objs
946.00 ± 8% +18.6% 1122 ± 4% slabinfo.blkdev_requests.num_objs
1334 ± 8% +17.4% 1566 ± 8% slabinfo.mnt_cache.active_objs
1334 ± 8% +17.4% 1566 ± 8% slabinfo.mnt_cache.num_objs
3197 ± 14% -19.7% 2566 ± 19% numa-vmstat.node0.nr_mapped
2974 ± 25% -50.0% 1486 ± 62% numa-vmstat.node0.nr_shmem
578437 ± 7% +14.0% 659152 ± 4% numa-vmstat.node0.numa_local
714274 ± 4% +13.8% 813152 ± 5% numa-vmstat.node1.numa_hit
652358 ± 5% +15.1% 751050 ± 5% numa-vmstat.node1.numa_local
4221 ± 6% -14.6% 3605 proc-vmstat.nr_shmem
841526 ± 11% +42.8% 1201746 ± 2% proc-vmstat.numa_hit
830464 ± 11% +43.4% 1190659 ± 2% proc-vmstat.numa_local
9170 ± 8% +209.7% 28402 ± 2% proc-vmstat.pgalloc_dma32
1128920 ± 9% +218.7% 3598182 ± 2% proc-vmstat.pgalloc_normal
1127154 ± 9% +220.8% 3615619 ± 2% proc-vmstat.pgfree
8.641e+08 ± 6% +175.9% 2.384e+09 cpuidle.C1-HSW.time
19744589 ± 10% +398.7% 98458401 ± 4% cpuidle.C1-HSW.usage
2.447e+08 ± 17% -75.4% 60171127 ± 10% cpuidle.C1E-HSW.time
3830735 ± 8% -58.2% 1601529 ± 9% cpuidle.C1E-HSW.usage
9.043e+08 ± 19% -69.2% 2.782e+08 ± 22% cpuidle.C3-HSW.time
4039781 ± 19% -62.7% 1508151 ± 11% cpuidle.C3-HSW.usage
1.484e+10 ± 15% -14.2% 1.273e+10 ± 2% cpuidle.C6-HSW.time
16023790 ± 14% -14.2% 13752412 ± 2% cpuidle.C6-HSW.usage
80503 ± 6% +208.3% 248204 ± 3% cpuidle.POLL.usage
19744447 ± 10% +398.7% 98458224 ± 4% turbostat.C1
4.38 ± 16% +8.8 13.18 ± 3% turbostat.C1%
3830618 ± 8% -58.2% 1601511 ± 9% turbostat.C1E
1.24 ± 22% -0.9 0.33 ± 12% turbostat.C1E%
4039779 ± 19% -62.7% 1508141 ± 11% turbostat.C3
4.52 ± 18% -3.0 1.54 ± 23% turbostat.C3%
16023643 ± 14% -14.2% 13752253 ± 2% turbostat.C6
41.38 ± 4% +21.0% 50.05 turbostat.CPU%c1
4.54 ± 17% -70.3% 1.34 ± 15% turbostat.CPU%c3
37.02 ± 7% -10.8% 33.02 ± 2% turbostat.CPU%c6
66960526 ± 6% +179.6% 1.872e+08 ± 3% turbostat.IRQ
2.94 ± 65% -71.9% 0.83 ± 39% turbostat.Pkg%pc2
0.94 ± 94% -99.2% 0.01 ±173% turbostat.Pkg%pc3
63.26 +5.0% 66.43 turbostat.PkgWatt
20.78 +1.0% 21.00 turbostat.RAMWatt
3.445e+11 ± 12% +32.2% 4.555e+11 ± 4% perf-stat.branch-instructions
4.52e+09 ± 5% +27.9% 5.78e+09 ± 3% perf-stat.branch-misses
2.99 ± 4% -1.1 1.86 ± 7% perf-stat.cache-miss-rate%
9.85e+08 ± 6% +28.0% 1.261e+09 ± 6% perf-stat.cache-misses
3.292e+10 ± 4% +106.3% 6.791e+10 ± 2% perf-stat.cache-references
24452174 ± 6% +247.3% 84913954 ± 3% perf-stat.context-switches
2.85 ± 2% -43.9% 1.60 perf-stat.cpi
4.408e+12 ± 13% -10.7% 3.936e+12 ± 4% perf-stat.cpu-cycles
321823 ± 8% +452.3% 1777352 ± 10% perf-stat.cpu-migrations
0.25 ± 9% -0.1 0.11 ± 6% perf-stat.dTLB-load-miss-rate%
1.031e+09 ± 7% -41.4% 6.036e+08 ± 3% perf-stat.dTLB-load-misses
4.191e+11 ± 9% +33.7% 5.605e+11 ± 8% perf-stat.dTLB-loads
0.07 ± 11% -0.0 0.02 ± 4% perf-stat.dTLB-store-miss-rate%
1.278e+08 ± 14% -23.1% 98246252 ± 2% perf-stat.dTLB-store-misses
1.77e+11 ± 6% +142.9% 4.3e+11 ± 4% perf-stat.dTLB-stores
27.87 ± 7% -15.1 12.82 ± 3% perf-stat.iTLB-load-miss-rate%
2.071e+08 ± 9% +23.3% 2.553e+08 perf-stat.iTLB-load-misses
5.358e+08 ± 6% +224.5% 1.739e+09 ± 4% perf-stat.iTLB-loads
1.544e+12 ± 11% +59.6% 2.465e+12 ± 3% perf-stat.instructions
7495 ± 10% +28.8% 9656 ± 4% perf-stat.instructions-per-iTLB-miss
0.35 ± 2% +78.1% 0.63 perf-stat.ipc
60.31 +12.5 72.81 perf-stat.node-store-miss-rate%
1.269e+08 ± 6% -34.4% 83308475 ± 5% perf-stat.node-stores
26.90 ± 16% -38.5% 16.55 ± 10% sched_debug.cfs_rq:/.load.avg
459.88 ± 4% -42.5% 264.50 ± 26% sched_debug.cfs_rq:/.load.max
86.25 ± 12% -37.1% 54.26 ± 15% sched_debug.cfs_rq:/.load.stddev
16.38 ± 11% -26.6% 12.03 ± 12% sched_debug.cfs_rq:/.load_avg.avg
219.34 ± 10% -15.1% 186.29 ± 15% sched_debug.cfs_rq:/.load_avg.max
134067 ± 26% -33.7% 88834 ± 4% sched_debug.cfs_rq:/.min_vruntime.avg
11122 ± 32% -69.2% 3431 ± 31% sched_debug.cfs_rq:/.min_vruntime.min
0.16 ± 17% -25.6% 0.12 ± 6% sched_debug.cfs_rq:/.nr_running.avg
0.36 ± 8% -12.6% 0.32 ± 3% sched_debug.cfs_rq:/.nr_running.stddev
2.88 ± 18% -58.4% 1.20 ± 8% sched_debug.cfs_rq:/.runnable_load_avg.avg
7.29 ± 11% -41.2% 4.28 ± 9% sched_debug.cfs_rq:/.runnable_load_avg.stddev
-42705 +81.7% -77607 sched_debug.cfs_rq:/.spread0.avg
118101 ± 32% -28.0% 84998 ± 5% sched_debug.cfs_rq:/.spread0.max
759056 ± 3% -6.9% 706698 sched_debug.cpu.avg_idle.avg
325096 ± 10% +19.3% 387735 sched_debug.cpu.avg_idle.stddev
2.88 ± 14% -43.1% 1.64 ± 39% sched_debug.cpu.cpu_load[1].avg
2.65 ± 9% -50.5% 1.31 ± 23% sched_debug.cpu.cpu_load[2].avg
2.31 ± 11% -55.9% 1.02 ± 11% sched_debug.cpu.cpu_load[3].avg
5.17 ± 12% -26.3% 3.81 ± 21% sched_debug.cpu.cpu_load[3].stddev
1.83 ± 15% -54.9% 0.83 ± 10% sched_debug.cpu.cpu_load[4].avg
4.25 ± 22% -35.5% 2.74 ± 10% sched_debug.cpu.cpu_load[4].stddev
878.23 ± 13% -21.6% 688.59 ± 4% sched_debug.cpu.curr->pid.avg
2204 ± 3% -8.9% 2007 ± 2% sched_debug.cpu.curr->pid.stddev
25.25 ± 15% -37.9% 15.67 ± 20% sched_debug.cpu.load.avg
82.80 ± 19% -39.4% 50.18 ± 23% sched_debug.cpu.load.stddev
99043 ± 9% +30.0% 128766 sched_debug.cpu.nr_load_updates.max
26577 ± 9% +35.9% 36127 sched_debug.cpu.nr_load_updates.stddev
0.16 ± 17% -26.4% 0.12 ± 7% sched_debug.cpu.nr_running.avg
0.36 ± 7% -12.5% 0.32 ± 3% sched_debug.cpu.nr_running.stddev
228614 ± 11% +214.9% 719814 ± 4% sched_debug.cpu.nr_switches.avg
522526 ± 12% +295.7% 2067603 ± 3% sched_debug.cpu.nr_switches.max
12788 ± 20% -51.1% 6250 ± 36% sched_debug.cpu.nr_switches.min
213191 ± 11% +273.5% 796205 ± 3% sched_debug.cpu.nr_switches.stddev
43.52 ± 24% +448.0% 238.50 ± 7% sched_debug.cpu.nr_uninterruptible.max
-33.78 +297.6% -134.29 sched_debug.cpu.nr_uninterruptible.min
9.43 ± 12% +306.9% 38.38 ± 7% sched_debug.cpu.nr_uninterruptible.stddev
22.61 ± 14% -19.2 3.45 ± 13% perf-profile.calltrace.cycles-pp.ip_finish_output2.ip_finish_output.ip_output.ip_local_out.ip_queue_xmit
22.64 ± 14% -19.1 3.54 ± 13% perf-profile.calltrace.cycles-pp.ip_finish_output.ip_output.ip_local_out.ip_queue_xmit.tcp_transmit_skb
18.80 ± 12% -18.7 0.14 ±173% perf-profile.calltrace.cycles-pp.dev_hard_start_xmit.sch_direct_xmit.__dev_queue_xmit.dev_queue_xmit.ip_finish_output2
18.78 ± 12% -18.6 0.14 ±173% perf-profile.calltrace.cycles-pp.ixgbe_xmit_frame.dev_hard_start_xmit.sch_direct_xmit.__dev_queue_xmit.dev_queue_xmit
18.62 ± 13% -18.6 0.00 perf-profile.calltrace.cycles-pp.intel_map_page.ixgbe_xmit_frame_ring.ixgbe_xmit_frame.dev_hard_start_xmit.sch_direct_xmit
18.75 ± 12% -18.6 0.13 ±173% perf-profile.calltrace.cycles-pp.ixgbe_xmit_frame_ring.ixgbe_xmit_frame.dev_hard_start_xmit.sch_direct_xmit.__dev_queue_xmit
18.59 ± 13% -18.6 0.00 perf-profile.calltrace.cycles-pp.__intel_map_single.intel_map_page.ixgbe_xmit_frame_ring.ixgbe_xmit_frame.dev_hard_start_xmit
18.87 ± 12% -18.6 0.31 ±100% perf-profile.calltrace.cycles-pp.sch_direct_xmit.__dev_queue_xmit.dev_queue_xmit.ip_finish_output2.ip_finish_output
18.43 ± 13% -18.4 0.00 perf-profile.calltrace.cycles-pp.intel_alloc_iova.__intel_map_single.intel_map_page.ixgbe_xmit_frame_ring.ixgbe_xmit_frame
18.25 ± 13% -18.2 0.00 perf-profile.calltrace.cycles-pp.alloc_iova.intel_alloc_iova.__intel_map_single.intel_map_page.ixgbe_xmit_frame_ring
19.12 ± 12% -16.4 2.69 ± 23% perf-profile.calltrace.cycles-pp.__dev_queue_xmit.dev_queue_xmit.ip_finish_output2.ip_finish_output.ip_output
19.14 ± 12% -16.2 2.89 ± 17% perf-profile.calltrace.cycles-pp.dev_queue_xmit.ip_finish_output2.ip_finish_output.ip_output.ip_local_out
15.41 ± 15% -15.4 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.alloc_iova.intel_alloc_iova.__intel_map_single.intel_map_page
15.32 ± 15% -15.3 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.alloc_iova.intel_alloc_iova.__intel_map_single
11.68 ± 21% -11.7 0.00 perf-profile.calltrace.cycles-pp.free_iova.flush_unmaps_timeout.call_timer_fn.run_timer_softirq.__do_softirq
7.51 ± 7% -7.5 0.00 perf-profile.calltrace.cycles-pp.flush_unmaps_timeout.call_timer_fn.run_timer_softirq.__do_softirq.irq_exit
9.15 ± 16% -7.4 1.73 ± 8% perf-profile.calltrace.cycles-pp.ip_output.ip_local_out.ip_queue_xmit.tcp_transmit_skb.tcp_write_xmit
9.20 ± 16% -7.3 1.88 ± 7% perf-profile.calltrace.cycles-pp.ip_local_out.ip_queue_xmit.tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames
7.66 ± 5% -7.2 0.45 ± 58% perf-profile.calltrace.cycles-pp.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle
7.12 ± 4% -7.1 0.00 perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter
7.07 ± 5% -7.1 0.00 perf-profile.calltrace.cycles-pp.run_timer_softirq.__do_softirq.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt
7.06 ± 5% -7.1 0.00 perf-profile.calltrace.cycles-pp.call_timer_fn.run_timer_softirq.__do_softirq.irq_exit.smp_apic_timer_interrupt
8.85 ± 4% -6.8 2.00 ± 13% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry
8.38 ± 11% -6.8 1.62 ± 9% perf-profile.calltrace.cycles-pp.ip_output.ip_local_out.ip_queue_xmit.tcp_transmit_skb.tcp_send_ack
8.78 ± 4% -6.7 2.07 ± 13% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
27.42 ± 12% -5.5 21.88 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath
5.01 ± 20% -5.0 0.00 perf-profile.calltrace.cycles-pp.find_iova.free_iova.flush_unmaps_timeout.call_timer_fn.run_timer_softirq
4.69 ± 20% -4.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.find_iova.free_iova.flush_unmaps_timeout.call_timer_fn
5.27 ± 16% -4.3 0.97 ± 9% perf-profile.calltrace.cycles-pp.ip_output.ip_local_out.ip_queue_xmit.tcp_transmit_skb.tcp_connect
5.28 ± 16% -4.3 1.02 ± 9% perf-profile.calltrace.cycles-pp.ip_local_out.ip_queue_xmit.tcp_transmit_skb.tcp_connect.tcp_v4_connect
5.29 ± 16% -4.2 1.06 ± 8% perf-profile.calltrace.cycles-pp.ip_queue_xmit.tcp_transmit_skb.tcp_connect.tcp_v4_connect.__inet_stream_connect
5.36 ± 16% -4.1 1.25 ± 9% perf-profile.calltrace.cycles-pp.tcp_transmit_skb.tcp_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect
4.79 ± 16% -3.9 0.90 ± 5% perf-profile.calltrace.cycles-pp.ip_queue_xmit.tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_send_fin
4.88 ± 16% -3.8 1.05 ± 5% perf-profile.calltrace.cycles-pp.tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_send_fin.tcp_close
4.99 ± 15% -3.7 1.34 ± 5% perf-profile.calltrace.cycles-pp.tcp_write_xmit.__tcp_push_pending_frames.tcp_send_fin.tcp_close.inet_release
5.02 ± 15% -3.6 1.38 ± 5% perf-profile.calltrace.cycles-pp.__tcp_push_pending_frames.tcp_send_fin.tcp_close.inet_release.sock_release
5.17 ± 15% -3.6 1.53 ± 5% perf-profile.calltrace.cycles-pp.tcp_send_fin.tcp_close.inet_release.sock_release.sock_close
5.56 ± 15% -3.6 1.94 ± 6% perf-profile.calltrace.cycles-pp.tcp_connect.tcp_v4_connect.__inet_stream_connect.inet_stream_connect.SYSC_connect
5.39 ± 15% -3.5 1.88 ± 4% perf-profile.calltrace.cycles-pp.tcp_close.inet_release.sock_release.sock_close.__fput
5.47 ± 15% -3.5 1.99 ± 4% perf-profile.calltrace.cycles-pp.inet_release.sock_release.sock_close.__fput.____fput
5.50 ± 15% -3.5 2.04 ± 5% perf-profile.calltrace.cycles-pp.sock_release.sock_close.__fput.____fput.task_work_run
5.53 ± 15% -3.5 2.08 ± 6% perf-profile.calltrace.cycles-pp.sock_close.__fput.____fput.task_work_run.exit_to_usermode_loop
6.03 ± 13% -3.4 2.61 ± 8% perf-profile.calltrace.cycles-pp.tcp_prequeue_process.tcp_recvmsg.inet_recvmsg.sock_recvmsg.SYSC_recvfrom
5.89 ± 13% -2.8 3.08 ± 6% perf-profile.calltrace.cycles-pp.tcp_v4_connect.__inet_stream_connect.inet_stream_connect.SYSC_connect.sys_connect
6.65 ± 12% -2.7 4.00 ± 10% perf-profile.calltrace.cycles-pp.tcp_recvmsg.inet_recvmsg.sock_recvmsg.SYSC_recvfrom.sys_recvfrom
6.73 ± 12% -2.6 4.12 ± 10% perf-profile.calltrace.cycles-pp.inet_recvmsg.sock_recvmsg.SYSC_recvfrom.sys_recvfrom.entry_SYSCALL_64_fastpath
4.75 ± 10% -2.6 2.19 ± 9% perf-profile.calltrace.cycles-pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_prequeue_process.tcp_recvmsg.inet_recvmsg
6.07 ± 15% -2.5 3.56 ± 8% perf-profile.calltrace.cycles-pp.__fput.____fput.task_work_run.exit_to_usermode_loop.syscall_return_slowpath
6.19 ± 15% -2.5 3.68 ± 8% perf-profile.calltrace.cycles-pp.____fput.task_work_run.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
6.82 ± 12% -2.5 4.31 ± 9% perf-profile.calltrace.cycles-pp.sock_recvmsg.SYSC_recvfrom.sys_recvfrom.entry_SYSCALL_64_fastpath
6.22 ± 15% -2.5 3.75 ± 8% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
6.29 ± 15% -2.5 3.83 ± 8% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.syscall_return_slowpath.entry_SYSCALL_64_fastpath
6.96 ± 12% -2.4 4.52 ± 9% perf-profile.calltrace.cycles-pp.sys_recvfrom.entry_SYSCALL_64_fastpath
4.78 ± 10% -2.4 2.34 ± 8% perf-profile.calltrace.cycles-pp.tcp_v4_do_rcv.tcp_prequeue_process.tcp_recvmsg.inet_recvmsg.sock_recvmsg
6.71 ± 11% -2.4 4.27 ± 7% perf-profile.calltrace.cycles-pp.__inet_stream_connect.inet_stream_connect.SYSC_connect.sys_connect.entry_SYSCALL_64_fastpath
6.92 ± 12% -2.4 4.49 ± 9% perf-profile.calltrace.cycles-pp.SYSC_recvfrom.sys_recvfrom.entry_SYSCALL_64_fastpath
6.34 ± 15% -2.4 3.91 ± 8% perf-profile.calltrace.cycles-pp.syscall_return_slowpath.entry_SYSCALL_64_fastpath
6.80 ± 11% -2.4 4.38 ± 6% perf-profile.calltrace.cycles-pp.inet_stream_connect.SYSC_connect.sys_connect.entry_SYSCALL_64_fastpath
5.02 ± 14% -2.3 2.69 ± 9% perf-profile.calltrace.cycles-pp.tcp_sendmsg.inet_sendmsg.sock_sendmsg.SYSC_sendto.sys_sendto
5.17 ± 15% -2.3 2.91 ± 9% perf-profile.calltrace.cycles-pp.inet_sendmsg.sock_sendmsg.SYSC_sendto.sys_sendto.entry_SYSCALL_64_fastpath
5.23 ± 15% -2.2 3.04 ± 9% perf-profile.calltrace.cycles-pp.sock_sendmsg.SYSC_sendto.sys_sendto.entry_SYSCALL_64_fastpath
6.95 ± 11% -2.2 4.77 ± 7% perf-profile.calltrace.cycles-pp.sys_connect.entry_SYSCALL_64_fastpath
6.93 ± 11% -2.2 4.75 ± 7% perf-profile.calltrace.cycles-pp.SYSC_connect.sys_connect.entry_SYSCALL_64_fastpath
5.33 ± 15% -2.1 3.19 ± 9% perf-profile.calltrace.cycles-pp.SYSC_sendto.sys_sendto.entry_SYSCALL_64_fastpath
5.35 ± 15% -2.1 3.21 ± 9% perf-profile.calltrace.cycles-pp.sys_sendto.entry_SYSCALL_64_fastpath
4.91 ± 5% -0.4 4.56 ± 2% perf-profile.calltrace.cycles-pp.tcp_v4_do_rcv.tcp_v4_rcv.ip_local_deliver_finish.ip_local_deliver.ip_rcv_finish
5.26 ± 3% +0.8 6.09 ± 3% perf-profile.calltrace.cycles-pp.tcp_v4_rcv.ip_local_deliver_finish.ip_local_deliver.ip_rcv_finish.ip_rcv
5.30 ± 3% +1.0 6.27 ± 3% perf-profile.calltrace.cycles-pp.ip_local_deliver_finish.ip_local_deliver.ip_rcv_finish.ip_rcv.__netif_receive_skb_core
5.30 ± 3% +1.0 6.30 ± 3% perf-profile.calltrace.cycles-pp.ip_local_deliver.ip_rcv_finish.ip_rcv.__netif_receive_skb_core.__netif_receive_skb
3.68 ± 90% +1.3 4.97 ± 64% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry
5.49 ± 3% +1.6 7.11 ± 4% perf-profile.calltrace.cycles-pp.ip_rcv_finish.ip_rcv.__netif_receive_skb_core.__netif_receive_skb.netif_receive_skb_internal
5.53 ± 2% +1.7 7.22 ± 4% perf-profile.calltrace.cycles-pp.ip_rcv.__netif_receive_skb_core.__netif_receive_skb.netif_receive_skb_internal.napi_gro_receive
5.58 ± 2% +1.9 7.50 ± 4% perf-profile.calltrace.cycles-pp.__netif_receive_skb_core.__netif_receive_skb.netif_receive_skb_internal.napi_gro_receive.ixgbe_clean_rx_irq
5.58 ± 2% +2.0 7.54 ± 4% perf-profile.calltrace.cycles-pp.__netif_receive_skb.netif_receive_skb_internal.napi_gro_receive.ixgbe_clean_rx_irq.ixgbe_poll
5.61 ± 2% +2.0 7.61 ± 3% perf-profile.calltrace.cycles-pp.netif_receive_skb_internal.napi_gro_receive.ixgbe_clean_rx_irq.ixgbe_poll.net_rx_action
5.74 +2.3 8.06 ± 3% perf-profile.calltrace.cycles-pp.napi_gro_receive.ixgbe_clean_rx_irq.ixgbe_poll.net_rx_action.__do_softirq
44.61 +2.8 47.41 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry
7.31 ± 11% +2.8 10.16 ± 4% perf-profile.calltrace.cycles-pp.ixgbe_poll.net_rx_action.__do_softirq.irq_exit.__irqentry_text_start
7.38 ± 10% +2.9 10.30 ± 4% perf-profile.calltrace.cycles-pp.net_rx_action.__do_softirq.irq_exit.__irqentry_text_start.ret_from_intr
5.93 +3.0 8.93 ± 3% perf-profile.calltrace.cycles-pp.ixgbe_clean_rx_irq.ixgbe_poll.net_rx_action.__do_softirq.irq_exit
6.77 +3.8 10.54 ± 4% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit.__irqentry_text_start.ret_from_intr.cpuidle_enter
65.64 ± 5% +4.3 69.90 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
65.66 ± 5% +4.3 69.99 ± 3% perf-profile.calltrace.cycles-pp.call_cpuidle.cpu_startup_entry.start_secondary
7.05 +4.7 11.73 ± 4% perf-profile.calltrace.cycles-pp.irq_exit.__irqentry_text_start.ret_from_intr.cpuidle_enter.call_cpuidle
49.21 ± 7% +4.8 54.03 ± 5% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
7.35 +5.5 12.86 ± 4% perf-profile.calltrace.cycles-pp.__irqentry_text_start.ret_from_intr.cpuidle_enter.call_cpuidle.cpu_startup_entry
7.36 +5.6 12.94 ± 4% perf-profile.calltrace.cycles-pp.ret_from_intr.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
68.19 ± 5% +6.4 74.55 ± 2% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary
68.47 ± 5% +6.6 75.05 ± 2% perf-profile.calltrace.cycles-pp.start_secondary
33.14 ± 17% -32.8 0.34 ± 40% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
33.38 ± 17% -32.4 0.97 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
22.90 ± 14% -19.3 3.62 ± 9% perf-profile.children.cycles-pp.ip_finish_output2
22.92 ± 14% -19.2 3.71 ± 8% perf-profile.children.cycles-pp.ip_finish_output
23.09 ± 14% -18.7 4.38 ± 7% perf-profile.children.cycles-pp.ip_output
23.21 ± 14% -18.5 4.67 ± 7% perf-profile.children.cycles-pp.ip_local_out
18.52 ± 13% -18.5 0.00 perf-profile.children.cycles-pp.alloc_iova
18.71 ± 13% -18.4 0.30 ± 9% perf-profile.children.cycles-pp.intel_alloc_iova
23.27 ± 14% -18.4 4.89 ± 7% perf-profile.children.cycles-pp.ip_queue_xmit
23.61 ± 13% -17.9 5.72 ± 7% perf-profile.children.cycles-pp.tcp_transmit_skb
18.87 ± 12% -17.8 1.07 ± 7% perf-profile.children.cycles-pp.__intel_map_single
18.90 ± 12% -17.8 1.14 ± 6% perf-profile.children.cycles-pp.intel_map_page
19.02 ± 12% -17.4 1.62 ± 7% perf-profile.children.cycles-pp.ixgbe_xmit_frame_ring
19.06 ± 12% -17.3 1.71 ± 6% perf-profile.children.cycles-pp.ixgbe_xmit_frame
19.07 ± 12% -17.3 1.79 ± 6% perf-profile.children.cycles-pp.dev_hard_start_xmit
19.15 ± 12% -17.1 2.07 ± 6% perf-profile.children.cycles-pp.sch_direct_xmit
19.41 ± 12% -16.3 3.09 ± 8% perf-profile.children.cycles-pp.__dev_queue_xmit
19.42 ± 12% -16.3 3.17 ± 8% perf-profile.children.cycles-pp.dev_queue_xmit
14.20 ± 12% -14.2 0.00 perf-profile.children.cycles-pp.free_iova
14.33 ± 12% -14.2 0.15 ± 34% perf-profile.children.cycles-pp.run_timer_softirq
14.32 ± 12% -14.2 0.14 ± 37% perf-profile.children.cycles-pp.call_timer_fn
26.05 ± 12% -12.0 14.05 ± 6% perf-profile.children.cycles-pp.__do_softirq
10.48 ± 2% -7.9 2.61 ± 11% perf-profile.children.cycles-pp.apic_timer_interrupt
10.43 ± 2% -7.9 2.56 ± 11% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
7.32 ± 13% -7.3 0.00 perf-profile.children.cycles-pp.find_iova
6.75 ± 11% -6.7 0.00 perf-profile.children.cycles-pp.__free_iova
9.62 ± 14% -6.5 3.15 ± 6% perf-profile.children.cycles-pp.tcp_write_xmit
8.92 ± 10% -6.4 2.48 ± 7% perf-profile.children.cycles-pp.tcp_send_ack
9.68 ± 14% -6.4 3.30 ± 7% perf-profile.children.cycles-pp.__tcp_push_pending_frames
21.12 ± 8% -5.7 15.38 ± 5% perf-profile.children.cycles-pp.irq_exit
27.72 ± 11% -5.5 22.19 ± 7% perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath
4.85 ± 27% -4.4 0.49 ± 16% perf-profile.children.cycles-pp.do_softirq
4.86 ± 27% -4.4 0.50 ± 13% perf-profile.children.cycles-pp.do_softirq_own_stack
4.95 ± 26% -4.1 0.87 ± 14% perf-profile.children.cycles-pp.__local_bh_enable_ip
5.17 ± 15% -3.6 1.53 ± 5% perf-profile.children.cycles-pp.tcp_send_fin
5.56 ± 15% -3.6 1.94 ± 7% perf-profile.children.cycles-pp.tcp_connect
5.39 ± 15% -3.5 1.88 ± 4% perf-profile.children.cycles-pp.tcp_close
5.47 ± 15% -3.5 1.99 ± 4% perf-profile.children.cycles-pp.inet_release
5.51 ± 15% -3.5 2.06 ± 5% perf-profile.children.cycles-pp.sock_release
5.53 ± 15% -3.4 2.08 ± 5% perf-profile.children.cycles-pp.sock_close
6.03 ± 13% -3.4 2.64 ± 8% perf-profile.children.cycles-pp.tcp_prequeue_process
4.23 ± 41% -3.3 0.98 ± 13% perf-profile.children.cycles-pp.intel_unmap
4.24 ± 41% -3.3 0.99 ± 13% perf-profile.children.cycles-pp.intel_unmap_page
4.70 ± 10% -3.2 1.51 ± 10% perf-profile.children.cycles-pp.__tcp_ack_snd_check
5.89 ± 13% -2.8 3.08 ± 6% perf-profile.children.cycles-pp.tcp_v4_connect
10.25 ± 7% -2.6 7.62 ± 6% perf-profile.children.cycles-pp.tcp_v4_do_rcv
6.65 ± 12% -2.6 4.03 ± 10% perf-profile.children.cycles-pp.tcp_recvmsg
6.73 ± 12% -2.6 4.13 ± 10% perf-profile.children.cycles-pp.inet_recvmsg
6.82 ± 12% -2.5 4.31 ± 9% perf-profile.children.cycles-pp.sock_recvmsg
6.08 ± 15% -2.5 3.57 ± 8% perf-profile.children.cycles-pp.__fput
6.20 ± 15% -2.5 3.69 ± 8% perf-profile.children.cycles-pp.____fput
6.29 ± 15% -2.4 3.85 ± 7% perf-profile.children.cycles-pp.exit_to_usermode_loop
6.23 ± 15% -2.4 3.78 ± 8% perf-profile.children.cycles-pp.task_work_run
6.71 ± 11% -2.4 4.27 ± 7% perf-profile.children.cycles-pp.__inet_stream_connect
6.96 ± 12% -2.4 4.53 ± 9% perf-profile.children.cycles-pp.sys_recvfrom
6.80 ± 11% -2.4 4.38 ± 6% perf-profile.children.cycles-pp.inet_stream_connect
6.34 ± 15% -2.4 3.92 ± 8% perf-profile.children.cycles-pp.syscall_return_slowpath
6.92 ± 12% -2.4 4.50 ± 9% perf-profile.children.cycles-pp.SYSC_recvfrom
5.02 ± 14% -2.3 2.69 ± 9% perf-profile.children.cycles-pp.tcp_sendmsg
5.17 ± 15% -2.2 2.92 ± 9% perf-profile.children.cycles-pp.inet_sendmsg
5.23 ± 15% -2.2 3.04 ± 9% perf-profile.children.cycles-pp.sock_sendmsg
5.06 ± 8% -2.2 2.88 ± 9% perf-profile.children.cycles-pp.tcp_rcv_established
6.95 ± 11% -2.2 4.79 ± 8% perf-profile.children.cycles-pp.sys_connect
6.93 ± 11% -2.2 4.77 ± 8% perf-profile.children.cycles-pp.SYSC_connect
5.36 ± 15% -2.1 3.22 ± 9% perf-profile.children.cycles-pp.sys_sendto
5.33 ± 15% -2.1 3.20 ± 9% perf-profile.children.cycles-pp.SYSC_sendto
5.09 ± 6% -0.7 4.41 ± 5% perf-profile.children.cycles-pp.tcp_rcv_state_process
5.80 ± 3% +1.2 7.01 ± 6% perf-profile.children.cycles-pp.tcp_v4_rcv
3.69 ± 90% +1.3 4.98 ± 64% perf-profile.children.cycles-pp.poll_idle
5.85 ± 3% +1.4 7.21 ± 6% perf-profile.children.cycles-pp.ip_local_deliver_finish
5.86 ± 3% +1.4 7.25 ± 6% perf-profile.children.cycles-pp.ip_local_deliver
11.29 ± 15% +1.6 12.93 ± 5% perf-profile.children.cycles-pp.ixgbe_poll
11.38 ± 14% +1.8 13.17 ± 6% perf-profile.children.cycles-pp.net_rx_action
6.08 ± 2% +2.1 8.14 ± 6% perf-profile.children.cycles-pp.ip_rcv_finish
6.12 ± 2% +2.2 8.27 ± 6% perf-profile.children.cycles-pp.ip_rcv
6.19 ± 2% +2.4 8.61 ± 5% perf-profile.children.cycles-pp.__netif_receive_skb_core
6.19 ± 2% +2.5 8.64 ± 5% perf-profile.children.cycles-pp.__netif_receive_skb
6.22 ± 2% +2.5 8.72 ± 5% perf-profile.children.cycles-pp.netif_receive_skb_internal
6.38 +2.9 9.23 ± 5% perf-profile.children.cycles-pp.napi_gro_receive
44.79 +2.9 47.67 perf-profile.children.cycles-pp.intel_idle
6.63 +3.6 10.27 ± 5% perf-profile.children.cycles-pp.ixgbe_clean_rx_irq
12.32 ± 11% +3.7 16.06 ± 5% perf-profile.children.cycles-pp.__irqentry_text_start
12.35 ± 11% +3.9 16.20 ± 5% perf-profile.children.cycles-pp.ret_from_intr
66.57 ± 5% +4.3 70.83 ± 3% perf-profile.children.cycles-pp.cpuidle_enter
66.60 ± 5% +4.3 70.93 ± 2% perf-profile.children.cycles-pp.call_cpuidle
49.42 ± 7% +4.9 54.32 ± 4% perf-profile.children.cycles-pp.cpuidle_enter_state
69.17 ± 5% +6.4 75.57 ± 2% perf-profile.children.cycles-pp.cpu_startup_entry
68.47 ± 5% +6.6 75.05 ± 2% perf-profile.children.cycles-pp.start_secondary
33.14 ± 17% -32.8 0.34 ± 40% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
3.69 ± 90% +1.3 4.98 ± 64% perf-profile.self.cycles-pp.poll_idle
44.78 +2.9 47.66 perf-profile.self.cycles-pp.intel_idle
netperf.Throughput_tps
7000 +-+------------------------------------------------------------------+
| |
6000 O-+ O O O O |
| O O |
5000 +-+ O O O O |
| O O O O O O O O O O O O O O |
4000 +-+ |
| |
3000 +-+ |
| |
2000 +-+.+.+ + +. .+.+.+..+.+.+ +.+.+..+ |
| : :: : + +.+.+ : : : .+. |
1000 +-+ : : : : : + : : : +.+.+.+..+ +.|
| : : :: : + :: :+ |
0 +-+-----------------------------O------------------------------------+
perf-stat.instructions
4e+12 +-+---------------------------------------------------------------+
| |
3.5e+12 +-+ O O O O O |
3e+12 O-O O |
| |
2.5e+12 +-+ O O O O O O |
| O O O O O O O O O O O |
2e+12 +-+ .+.+. |
|.+.+.+ +..+ +.+.+ +.+.+.+ +.+.+.+ .|
1.5e+12 +-+ : + : : : : : : +.+.+.+.+.+.+ |
1e+12 +-+ : : : : : : : : : |
| : : : : : : : : : : |
5e+11 +-+ : : : : : : : : : : |
| : : : : : |
0 +-+----------------------------O----------------------------------+
perf-stat.cache-references
1.2e+11 +-+---------------------------------------------------------------+
| |
1e+11 +-+ O O |
O O O O O |
| |
8e+10 +-+ O O |
| O O O O O O O O O O O O O |
6e+10 +-+ O O O |
| |
4e+10 +-+.+.+ + +..+ +. .+.+.+.+.+.+.+ +.+.+.+ |
| : : : : : + : : : +.+.+.+.+.+.+.|
| : : : : : : : : : : |
2e+10 +-+ : : : : : : : : : : |
| : : : : : : |
0 +-+----------------------------O----------------------------------+
perf-stat.dTLB-stores
6e+11 +-+-----------------------------------------------------------------+
| O O |
5e+11 O-O O |
| O O O O |
| O O O O O O O O O O O O O |
4e+11 +-+ O O O |
| |
3e+11 +-+ |
| .+. |
2e+11 +-+.+.+ + +.+ +.+..+.+ +.+.+.+ +.+.+.+ |
| : : : : : : : : +.+.+..+.+.+.+.|
| : : : : : : : : : : |
1e+11 +-+ : : : : : : : : : : |
| :: : : :: :: |
0 +-+-----------------------------O-----------------------------------+
perf-stat.iTLB-loads
7e+09 +-+-----------------------------------------------------------------+
| O |
6e+09 O-O O O |
| |
5e+09 +-+ |
| |
4e+09 +-+ |
| |
3e+09 +-+ |
| O O |
2e+09 +-+ O O O O O O O O O O O O O |
| O O O O O |
1e+09 +-+.+.+.. +. +. .+.+.+.+.+.+ +.+.+.+ |
| .+. + +. + +..+ + .. + .+.+.+..+.+.+.+.|
0 +-+-----------------------------O-----------------------------------+
perf-stat.context-switches
1.2e+08 +-+---------------------------------------------------------------+
O O O O |
1e+08 +-O O O |
| O O |
| O O O O O O O O O O O O O O |
8e+07 +-+ O O |
| |
6e+07 +-+ |
| |
4e+07 +-+.+.+ +.. .+.+.+.+.+.+ +.+.+.+ |
| : + : + +.+.+ : : : |
| : :: : : : : : : +.+.+. .+.+.+.|
2e+07 +-+ : : : : : : : : : : + |
| :: : :: : : : |
0 +-+----------------------------O----------------------------------+
perf-stat.cpu-migrations
4e+06 +-+---------------------------------------------------------------+
| |
3.5e+06 +-+ O O |
3e+06 +-O O |
| |
2.5e+06 +-+ O O |
O O O O O O O |
2e+06 +-+ O O O O |
| O O O O O O O |
1.5e+06 +-+ O |
1e+06 +-+ |
| |
500000 +-+ +.. |
|.+.+.+. .+. + +. .+.+.+.+.+.+.+.+.+. .+.+.+.+.. .+.+.+.+.+.+.+.|
0 +-+----------------------------O----------------------------------+
perf-stat.iTLB-load-miss-rate_
80 +-+--------------------------------------------------------------------+
| + O |
70 +-+ + : |
| + : : + + |
60 +-+ : :: : : : : |
| : : : : : : :: |
50 +-+ : : : : : : : : : : |
| : : : : : : : : : : |
40 +-+ : : : : : : : : : : |
| : : : : : : : : : : .+ |
30 +-+ : : : : : : : : +.. .+ + .+.|
| : : : : :.+..+. : : : + +..+ |
20 +-+.+..+ + +.+ + +.+.+..+.+.+ +..+.+.+ |
| O O O O O O O O O |
10 O-O-O--O-O-O-O------O-O-O--O-O------O-O---O--------O-------------------+
perf-stat.ipc
0.7 +-+--------O-O-------------------------------------------------------+
0.65 O-O O O O O O O |
| O O O O O O O O O O O O O O |
0.6 +-+ |
0.55 +-+ O |
| |
0.5 +-+ |
0.45 +-+ |
0.4 +-+.+.+ + +.+ +. .+.+.+.+..+.+.+ +.+.+..+ |
| : : : : : + : : : .+. .+. |
0.35 +-+ : : : : : : : : : + +.+..+ +.|
0.3 +-+ : : : : : : : : : : |
| :: : : :: : :: |
0.25 +-+ + : + + + |
0.2 +-+-----------------------------O------------------------------------+
perf-stat.cpi
5 +-+-------------------------------------------------------------------+
| : O |
4.5 +-+ + : + + + |
4 +-+ : : : : : : |
| : : : : : : : : : : |
3.5 +-+ : : : : : : : : : : |
| : : : : : : : : : : |
3 +-+ : :: : : : : : : : .+.+.. .|
| : : : : : : : : +.+.+ +.+ |
2.5 +-+.+..+ + +..+ +.+.+.+..+.+.+.+.+ +.+.+.+ |
2 +-+ |
| O |
1.5 O-O O O O O O O O O O O O O O O O O O O O O O O |
| |
1 +-+-------------------------------------------------------------------+
netperf.time.user_time
120 +-+-------------------------------------------------------------------+
O O O O O O |
100 +-+ O |
| |
| |
80 +-+ O O O O O O O O O O O O O |
| O O O O O |
60 +-+ |
| |
40 +-+ |
|.+. .+.+. +.+. .+.+.|
| +..+ + +..+ +.+.+.+..+ +.+ : +.+ +.+.+.+.+. |
20 +-+ : : : : : : + : + : |
| : : : : : : + : + : |
0 +-+------------------------------O------------------------------------+
netperf.time.voluntary_context_switches
6e+07 +-+-----------------------------------------------------------------+
O O O O |
5e+07 +-O O O |
| |
| O O O O O O O O O O O O |
4e+07 +-+ O O O O O O |
| |
3e+07 +-+ |
| |
2e+07 +-+.+.+ .+.+.+. .+.+ +. .+.+ |
| + : + +.+ +.+..+ + : : + : |
| : :: : : : : : : .+.+.. .+.+.|
1e+07 +-+ : : : : : : : : : + +.+ |
| : : :: :: :: :+ |
0 +-+-----------------------------O-----------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong
View attachment "config-4.6.0-rc4-00008-g22e2f9f" of type "text/plain" (151095 bytes)
View attachment "job-script" of type "text/plain" (7273 bytes)
View attachment "job.yaml" of type "text/plain" (4855 bytes)
View attachment "reproduce" of type "text/plain" (987 bytes)
Powered by blists - more mailing lists