[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <VI1PR0501MB2784F97B00E24CD598B2E96FBFD80@VI1PR0501MB2784.eurprd05.prod.outlook.com>
Date: Wed, 7 Mar 2018 13:51:02 +0000
From: Yuval Mintz <yuvalm@...lanox.com>
To: Kirill Tkhai <ktkhai@...tuozzo.com>,
Eric Dumazet <eric.dumazet@...il.com>
CC: "davem@...emloft.net" <davem@...emloft.net>,
"yoshfuji@...ux-ipv6.org" <yoshfuji@...ux-ipv6.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH RESEND net-next] net: Do synchronize_rcu() in
ip6mr_sk_done() only if this is needed
> >>> After unshare test kworker hangs for ages:
> >>>
> >>> $ while :; do unshare -n true; done &
> >>>
> >>> $ perf report <kworker>
> >>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k]
> >>> cleanup_net
> >>> cleanup_net
> >>> - ops_exit_list.isra.9
> >>> - 85,68% igmp6_net_exit
> >>> - 53,31% sock_release
> >>> - inet_release
> >>> - 25,33% rawv6_close
> >>> - ip6mr_sk_done
> >>> + 23,38% synchronize_rcu
> >>>
> >>> Keep in mind, this perf report shows the time a function was
> >>> executing, and
> >>> it does not show the time, it was sleeping. But it's easy to imagine,
> >>> how
> >>> much it is sleeping, if synchronize_rcu() execution takes the most
> >>> time.
> >>> Top shows the kworker R time is less then 1%.
> >>>
> >>> This happen, because of in ip6mr_sk_done() we do too many
> >>> synchronize_rcu(),
> >>> even for the sockets, that are not referenced in mr_table, and which
> >>> are not
> >>> need it. So, the progress of kworker becomes very slow.
> >>>
> >>> The patch introduces apparent solution, and it makes ip6mr_sk_done()
> >>> to skip
> >>> synchronize_rcu() for sockets, that are not need that. After the
> >>> patch,
> >>> kworker becomes able to warm the cpu up as expected.
> >>>
> >>> Signed-off-by: Kirill Tkhai <ktkhai@...tuozzo.com>
> >>> ---
> >>> net/ipv6/ip6mr.c | 4 +++-
> >>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> >>> index 2a38f9de45d3..290a8d0d5eac 100644
> >>> --- a/net/ipv6/ip6mr.c
> >>> +++ b/net/ipv6/ip6mr.c
> >>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
> >>> }
> >>> }
> >>> rtnl_unlock();
> >>> - synchronize_rcu();
> >>> +
> >>> + if (!err)
> >>> + synchronize_rcu();
> >>>
> >>
> >>
> >> But... what is this synchronize_rcu() doing exactly ?
> >>
> >> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
> >>
> >> ("ip6mr: Make mroute_sk rcu-based")
> >>
> >> Typically on a delete, the synchronize_rcu() would be needed before
> >> freeing the deleted object.
> >>
> >> But nowadays we have better way : SOCK_RCU_FREE
> >
> > Hm. I'm agree with you. This is hot path, and mroute sockets created from
> userspace
> > will delay userspace tasks close() and exit(). Since there may be many such
> sockets,
> > we may get a zombie task, which can't be reaped for ages. This slows down
> the system
> > directly.
> >
> > Fix for pernet_operations works, but we need generic solution instead.
> >
> > The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says:
> >
> > ip6mr: Make mroute_sk rcu-based
> >
> > In ipmr the mr_table socket is handled under RCU. Introduce the same
> > for ip6mr.
> >
> > There is no pointing to improvements it invents, or to the problem it solves.
> The description
> > looks like a cleanup. It's too expensive cleanup, if it worsens the
> performance a hundred
> > times.
> >
> > Can't we simply revert it?!
> >
> > Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested
> by Eric)?
Sorry, failed to notice ip6mr_sk_done() is called unconditionally from
rawv6_close(). But even with your suggested fix it should be ~resolved
[as only sockets used for ip6mr would reach the sync].
Or are you claiming there's still some performance hit even with your
suggested change?
> >
> > We actually use rcu_dereference() in ip6mr_cache_report() only. The only
> user of dereference
> > is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in
> inet_sock_destruct().
>
> + So this change should be safe.
I might have misunderstood the comment from
commit 4c9687098f24 ("ipmr: RCU conversion of mroute_sk") when writing
this; Thought comment regarding ip_ra_destroy() meant that for the IPv6 case
we DO have to make sure there's a grace-period before destroying the socket.
>
> > Thanks,
> > Kirill
> >
Powered by blists - more mailing lists