lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Mar 2018 13:51:02 +0000
From:   Yuval Mintz <yuvalm@...lanox.com>
To:     Kirill Tkhai <ktkhai@...tuozzo.com>,
        Eric Dumazet <eric.dumazet@...il.com>
CC:     "davem@...emloft.net" <davem@...emloft.net>,
        "yoshfuji@...ux-ipv6.org" <yoshfuji@...ux-ipv6.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH RESEND net-next] net: Do synchronize_rcu() in
 ip6mr_sk_done() only if this is needed

> >>> After unshare test kworker hangs for ages:
> >>>
> >>>     $ while :; do unshare -n true; done &
> >>>
> >>>     $ perf report <kworker>
> >>>     -   88,82%     0,00%  kworker/u16:0  [kernel.vmlinux]  [k]
> >>> cleanup_net
> >>>          cleanup_net
> >>>        - ops_exit_list.isra.9
> >>>           - 85,68% igmp6_net_exit
> >>>              - 53,31% sock_release
> >>>                 - inet_release
> >>>                    - 25,33% rawv6_close
> >>>                       - ip6mr_sk_done
> >>>                          + 23,38% synchronize_rcu
> >>>
> >>> Keep in mind, this perf report shows the time a function was
> >>> executing, and
> >>> it does not show the time, it was sleeping. But it's easy to imagine,
> >>> how
> >>> much it is sleeping, if synchronize_rcu() execution takes the most
> >>> time.
> >>> Top shows the kworker R time is less then 1%.
> >>>
> >>> This happen, because of in ip6mr_sk_done() we do too many
> >>> synchronize_rcu(),
> >>> even for the sockets, that are not referenced in mr_table, and which
> >>> are not
> >>> need it. So, the progress of kworker becomes very slow.
> >>>
> >>> The patch introduces apparent solution, and it makes ip6mr_sk_done()
> >>> to skip
> >>> synchronize_rcu() for sockets, that are not need that. After the
> >>> patch,
> >>> kworker becomes able to warm the cpu up as expected.
> >>>
> >>> Signed-off-by: Kirill Tkhai <ktkhai@...tuozzo.com>
> >>> ---
> >>>  net/ipv6/ip6mr.c |    4 +++-
> >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
> >>> index 2a38f9de45d3..290a8d0d5eac 100644
> >>> --- a/net/ipv6/ip6mr.c
> >>> +++ b/net/ipv6/ip6mr.c
> >>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
> >>>  		}
> >>>  	}
> >>>  	rtnl_unlock();
> >>> -	synchronize_rcu();
> >>> +
> >>> +	if (!err)
> >>> +		synchronize_rcu();
> >>>
> >>
> >>
> >> But... what is this synchronize_rcu() doing exactly ?
> >>
> >> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
> >>
> >> ("ip6mr: Make mroute_sk rcu-based")
> >>
> >> Typically on a delete, the synchronize_rcu() would be needed before
> >> freeing the deleted object.
> >>
> >> But nowadays we have better way : SOCK_RCU_FREE
> >
> > Hm. I'm agree with you. This is hot path, and mroute sockets created from
> userspace
> > will delay userspace tasks close() and exit(). Since there may be many such
> sockets,
> > we may get a zombie task, which can't be reaped for ages. This slows down
> the system
> > directly.
> >
> > Fix for pernet_operations works, but we need generic solution instead.
> >
> > The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says:
> >
> >     ip6mr: Make mroute_sk rcu-based
> >
> >     In ipmr the mr_table socket is handled under RCU. Introduce the same
> >     for ip6mr.
> >
> > There is no pointing to improvements it invents, or to the problem it solves.
> The description
> > looks like a cleanup. It's too expensive cleanup, if it worsens the
> performance a hundred
> > times.
> >
> > Can't we simply revert it?!
> >
> > Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested
> by Eric)?

Sorry, failed to notice ip6mr_sk_done() is called unconditionally from
rawv6_close(). But even with your suggested fix it should be ~resolved
[as only sockets used for ip6mr would reach the sync].
Or are you claiming there's still some performance hit even with your
suggested change?

> >
> > We actually use rcu_dereference() in ip6mr_cache_report() only. The only
> user of dereference
> > is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in
> inet_sock_destruct().
> 
> + So this change should be safe.

I might have misunderstood the comment from
commit 4c9687098f24 ("ipmr: RCU conversion of mroute_sk") when writing
this; Thought comment regarding ip_ra_destroy() meant that for the IPv6 case
we DO have to make sure there's a grace-period before destroying the socket.

> 
> > Thanks,
> > Kirill
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ