[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADvbK_eQUmb942vC+bG+NRzM1ki1LiCydEDR1AezZ35Jvsdfnw@mail.gmail.com>
Date: Thu, 23 Jun 2022 18:50:07 -0400
From: Xin Long <lucien.xin@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
kernel test robot <oliver.sang@...el.com>,
Eric Dumazet <edumazet@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
Soheil Hassas Yeganeh <soheil@...gle.com>,
LKML <linux-kernel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>,
network dev <netdev@...r.kernel.org>,
linux-s390@...r.kernel.org, mptcp@...ts.linux.dev,
"linux-sctp @ vger . kernel . org" <linux-sctp@...r.kernel.org>,
lkp@...ts.01.org, kbuild test robot <lkp@...el.com>,
Huang Ying <ying.huang@...el.com>, feng.tang@...el.com,
zhengjun.xing@...ux.intel.com, fengwei.yin@...el.com,
Ying Xu <yinxu@...hat.com>
Subject: Re: [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression
On Wed, Jun 22, 2022 at 11:08 PM Xin Long <lucien.xin@...il.com> wrote:
>
> Yes, I'm working on it. I couldn't see the regression in my env with
> the 'reproduce' script attached.
> I will try with lkp tomorrow.
>
> Thanks.
>
> On Wed, Jun 22, 2022 at 8:29 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > Could someone working on SCTP double check this is a real regression?
> > Feels like the regression reports are flowing at such rate its hard
> > to keep up.
> >
> > >
> > > commit:
> > > 7c80b038d2 ("net: fix sk_wmem_schedule() and sk_rmem_schedule() errors")
> > > 4890b686f4 ("net: keep sk->sk_forward_alloc as small as possible")
> > >
> > > 7c80b038d23e1f4c 4890b686f4088c90432149bd6de
> > > ---------------- ---------------------------
> > > %stddev %change %stddev
> > > \ | \
> > > 15855 -69.4% 4854 netperf.Throughput_Mbps
> > > 570788 -69.4% 174773 netperf.Throughput_total_Mbps
...
> > > 0.00 +5.1 5.10 ± 5% perf-profile.calltrace.cycles-pp.__sk_mem_reduce_allocated.sctp_wfree.skb_release_head_state.consume_skb.sctp_chunk_put
> > > 0.17 ±141% +5.3 5.42 ± 6% perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.sctp_chunk_put.sctp_outq_sack.sctp_cmd_interpreter
> > > 0.00 +5.3 5.35 ± 6% perf-profile.calltrace.cycles-pp.sctp_wfree.skb_release_head_state.consume_skb.sctp_chunk_put.sctp_outq_sack
> > > 0.00 +5.5 5.51 ± 6% perf-profile.calltrace.cycles-pp.__sk_mem_reduce_allocated.skb_release_head_state.kfree_skb_reason.sctp_recvmsg.inet_recvmsg
> > > 0.00 +5.7 5.65 ± 6% perf-profile.calltrace.cycles-pp.skb_release_head_state.kfree_skb_reason.sctp_recvmsg.inet_recvmsg.____sys_recvmsg
...
> > > 0.00 +4.0 4.04 ± 6% perf-profile.children.cycles-pp.mem_cgroup_charge_skmem
> > > 2.92 ± 6% +4.2 7.16 ± 6% perf-profile.children.cycles-pp.sctp_outq_sack
> > > 0.00 +4.3 4.29 ± 6% perf-profile.children.cycles-pp.__sk_mem_raise_allocated
> > > 0.00 +4.3 4.32 ± 6% perf-profile.children.cycles-pp.__sk_mem_schedule
> > > 1.99 ± 6% +4.4 6.40 ± 6% perf-profile.children.cycles-pp.consume_skb
> > > 1.78 ± 6% +4.6 6.42 ± 6% perf-profile.children.cycles-pp.kfree_skb_reason
> > > 0.37 ± 8% +5.0 5.40 ± 6% perf-profile.children.cycles-pp.sctp_wfree
> > > 0.87 ± 9% +10.3 11.20 ± 6% perf-profile.children.cycles-pp.skb_release_head_state
> > > 0.00 +10.7 10.66 ± 6% perf-profile.children.cycles-pp.__sk_mem_reduce_allocated
...
> > > 0.00 +1.2 1.19 ± 7% perf-profile.self.cycles-pp.try_charge_memcg
> > > 0.00 +2.0 1.96 ± 6% perf-profile.self.cycles-pp.page_counter_uncharge
> > > 0.00 +2.1 2.07 ± 5% perf-profile.self.cycles-pp.page_counter_try_charge
> > > 1.09 ± 8% +2.8 3.92 ± 6% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> > > 0.29 ± 6% +3.5 3.81 ± 6% perf-profile.self.cycles-pp.sctp_eat_data
> > > 0.00 +7.8 7.76 ± 6% perf-profile.self.cycles-pp.__sk_mem_reduce_allocated
>From the perf data, we can see __sk_mem_reduce_allocated() is the one
using CPU the most more than before, and mem_cgroup APIs are also
called in this function. It means the mem cgroup must be enabled in
the test env, which may explain why I couldn't reproduce it.
The Commit 4890b686f4 ("net: keep sk->sk_forward_alloc as small as
possible") uses sk_mem_reclaim(checking reclaimable >= PAGE_SIZE) to
reclaim the memory, which is *more frequent* to call
__sk_mem_reduce_allocated() than before (checking reclaimable >=
SK_RECLAIM_THRESHOLD). It might be cheap when
mem_cgroup_sockets_enabled is false, but I'm not sure if it's still
cheap when mem_cgroup_sockets_enabled is true.
I think SCTP netperf could trigger this, as the CPU is the bottleneck
for SCTP netperf testing, which is more sensitive to the extra
function calls than TCP.
Can we re-run this testing without mem cgroup enabled?
Thanks.
Powered by blists - more mailing lists