[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160819052941.GA1179@aaronlu.sh.intel.com>
Date: Fri, 19 Aug 2016 13:29:41 +0800
From: Aaron Lu <aaron.lu@...el.com>
To: Xin Long <lucien.xin@...il.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
kernel test robot <xiaolong.ye@...el.com>,
Stephen Rothwell <sfr@...b.auug.org.au>, lkp@...org,
"David S. Miller" <davem@...emloft.net>,
LKML <linux-kernel@...r.kernel.org>,
"Chen, Tim C" <tim.c.chen@...el.com>,
Huang Ying <ying.huang@...el.com>
Subject: Re: [LKP] [lkp] [sctp] a6c2f79287: netperf.Throughput_Mbps -37.2%
regression
On Thu, Aug 18, 2016 at 08:45:42PM +0800, Xin Long wrote:
> >> Hi, Aaron
> >>
> >> 1)
> >> I talked with Marcelo about this one.
> >> He said it might be related with cacheline. the new field distroyed
> >> the prior cacheline. So on top of commit 826d253d57b1, pls only add
> >> + unsigned long prsctp_param;
> >>
> >> to the end of struct sctp_chunk, then try.
> >
> > This doesn't work.
> >
>
> If it's because of cache lines changed, I'm not sure this, either.
> Maybe 2) is a good way to fix it.
A comparison of the good commit 826d253d57b1 and the bad a6c2f792873a:
tests: 8
testcase/path_params/tbox_group/run: netperf/ipv4-300s-200%-cs-localhost-10K-SCTP_STREAM_MANY-performance/lkp-ivb-d02
826d253d57b11f69 a6c2f792873aff332a4689717c
---------------- --------------------------
%stddev change %stddev
\ | \
3923 -37% 2461 netperf.Throughput_Mbps
9 -78% 2 vmstat.procs.r
112616 19% 133981 vmstat.system.cs
4053 7% 4350 vmstat.system.in
8598 ± 4% 957% 90912 softirqs.SCHED
16466114 -37% 10305467 softirqs.NET_RX
605899 -46% 329262 softirqs.TIMER
72067 ± 10% -63% 26356 ± 3% softirqs.RCU
4785 ± 7% -9% 4352 slabinfo.anon_vma_chain.num_objs
642 ± 7% 14% 731 ± 6% slabinfo.kmalloc-512.active_objs
4993 15% 5735 slabinfo.kmalloc-64.active_objs
4993 15% 5735 slabinfo.kmalloc-64.num_objs
2529 ± 4% -15% 2150 proc-vmstat.nr_alloc_batch
4.733e+08 -37% 2.999e+08 proc-vmstat.pgalloc_normal
8.476e+08 -37% 5.36e+08 proc-vmstat.pgfree
3.742e+08 -37% 2.361e+08 proc-vmstat.pgalloc_dma32
1.48e+08 -37% 93033641 proc-vmstat.numa_hit
1.48e+08 -37% 93033640 proc-vmstat.numa_local
0.05 ± 17% 52102% 24.80 turbostat.CPU%c1
0.64 3065% 20.10 ± 3% turbostat.CPU%c6
0.12 ± 39% 1900% 2.35 ± 3% turbostat.Pkg%pc2
0.46 ± 10% 1686% 8.22 ± 6% turbostat.Pkg%pc6
37.54 -14% 32.11 turbostat.PkgWatt
20.20 -25% 15.22 turbostat.CorWatt
99.31 -45% 54.97 turbostat.%Busy
3269 -45% 1803 turbostat.Avg_MHz
76510 ± 46% 3e+05% 1.954e+08 cpuidle.C1-IVB.time
19769 ± 17% 5534% 1113742 ± 5% cpuidle.C1E-IVB.time
151 ± 11% 4175% 6454 ± 7% cpuidle.C1E-IVB.usage
114 ± 14% 6216% 7232 ± 5% cpuidle.C3-IVB.usage
33074 ± 14% 5159% 1739419 ± 3% cpuidle.C3-IVB.time
8874 4203% 381901 cpuidle.C6-IVB.usage
8006184 4072% 3.34e+08 cpuidle.C6-IVB.time
12019 ± 35% 303% 48398 perf-stat.cpu-migrations
34232822 19% 40780053 perf-stat.context-switches
339045 5% 354573 perf-stat.minor-faults
339041 5% 354568 perf-stat.page-faults
2.776e+11 -28% 2.003e+11 perf-stat.branch-instructions
1.505e+12 -29% 1.065e+12 perf-stat.instructions
6.421e+11 -30% 4.473e+11 perf-stat.dTLB-loads
5.32e+11 -34% 3.536e+11 perf-stat.dTLB-stores
1.173e+11 -38% 7.271e+10 perf-stat.cache-references
3.735e+08 ± 5% -48% 1.959e+08 ± 4% perf-stat.iTLB-load-misses
3.864e+09 -51% 1.9e+09 perf-stat.branch-misses
4.069e+09 ± 20% -56% 1.798e+09 ± 40% perf-stat.dTLB-load-misses
5.285e+08 ± 22% -70% 1.585e+08 ± 16% perf-stat.dTLB-store-misses
7.126e+09 ± 16% -97% 2.27e+08 ± 4% perf-stat.cache-misses
The obvious change is:
1 the bad commit has a much fewer runnable process - vmstat.procs.r
2 the context switches are much higher in the bad commit - vmstat.system.cs
It all suggests the netperf processes go to sleep for some reason in the bad
commit.
I used "perf record -p one_netperf_pid -e probe:pick_next_task_idle" as
suggested by Tim to see where it went to sleep:
Samples: 78 of event 'probe:pick_next_task_idle', Event count(approx.): 78
Children Self Trace output
■- 100.00% 100.00% (ffffffff810fc750)
▒ __sendmsg_nocancel
▒ entry_SYSCALL_64_fastpath
▒ sys_sendmsg
▒ __sys_sendmsg
▒ ___sys_sendmsg
▒ inet_sendmsg
▒ sctp_sendmsg
▒ sctp_wait_for_sndbuf
▒ schedule_timeout
▒ schedule
▒ pick_next_task_idle
It doesn't look insane and sctp_wait_for_sndbuf may actually have
something to do with a larger sctp_chunk I suppose?
The same perf record doesn't capture any sample for the good commit,
which suggests the nerperf process doesn't sleep in sctp_wait_for_sndbuf.
Regards,
Aaron
Powered by blists - more mailing lists