[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b44fe837-3003-2547-afd4-52f3213a6642@virtuozzo.com>
Date: Wed, 7 Mar 2018 12:32:49 +0300
From: Kirill Tkhai <ktkhai@...tuozzo.com>
To: Eric Dumazet <eric.dumazet@...il.com>, davem@...emloft.net,
yoshfuji@...ux-ipv6.org, netdev@...r.kernel.org,
Yuval Mintz <yuvalm@...lanox.com>
Subject: Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in
ip6mr_sk_done() only if this is needed
On 07.03.2018 12:22, Kirill Tkhai wrote:
> On 06.03.2018 19:50, Eric Dumazet wrote:
>> On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote:
>>> After unshare test kworker hangs for ages:
>>>
>>> $ while :; do unshare -n true; done &
>>>
>>> $ perf report <kworker>
>>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k]
>>> cleanup_net
>>> cleanup_net
>>> - ops_exit_list.isra.9
>>> - 85,68% igmp6_net_exit
>>> - 53,31% sock_release
>>> - inet_release
>>> - 25,33% rawv6_close
>>> - ip6mr_sk_done
>>> + 23,38% synchronize_rcu
>>>
>>> Keep in mind, this perf report shows the time a function was
>>> executing, and
>>> it does not show the time, it was sleeping. But it's easy to imagine,
>>> how
>>> much it is sleeping, if synchronize_rcu() execution takes the most
>>> time.
>>> Top shows the kworker R time is less then 1%.
>>>
>>> This happen, because of in ip6mr_sk_done() we do too many
>>> synchronize_rcu(),
>>> even for the sockets, that are not referenced in mr_table, and which
>>> are not
>>> need it. So, the progress of kworker becomes very slow.
>>>
>>> The patch introduces apparent solution, and it makes ip6mr_sk_done()
>>> to skip
>>> synchronize_rcu() for sockets, that are not need that. After the
>>> patch,
>>> kworker becomes able to warm the cpu up as expected.
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@...tuozzo.com>
>>> ---
>>> net/ipv6/ip6mr.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
>>> index 2a38f9de45d3..290a8d0d5eac 100644
>>> --- a/net/ipv6/ip6mr.c
>>> +++ b/net/ipv6/ip6mr.c
>>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
>>> }
>>> }
>>> rtnl_unlock();
>>> - synchronize_rcu();
>>> +
>>> + if (!err)
>>> + synchronize_rcu();
>>>
>>
>>
>> But... what is this synchronize_rcu() doing exactly ?
>>
>> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
>>
>> ("ip6mr: Make mroute_sk rcu-based")
>>
>> Typically on a delete, the synchronize_rcu() would be needed before
>> freeing the deleted object.
>>
>> But nowadays we have better way : SOCK_RCU_FREE
>
> Hm. I'm agree with you. This is hot path, and mroute sockets created from userspace
> will delay userspace tasks close() and exit(). Since there may be many such sockets,
> we may get a zombie task, which can't be reaped for ages. This slows down the system
> directly.
>
> Fix for pernet_operations works, but we need generic solution instead.
>
> The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says:
>
> ip6mr: Make mroute_sk rcu-based
>
> In ipmr the mr_table socket is handled under RCU. Introduce the same
> for ip6mr.
>
> There is no pointing to improvements it invents, or to the problem it solves. The description
> looks like a cleanup. It's too expensive cleanup, if it worsens the performance a hundred
> times.
>
> Can't we simply revert it?!
>
> Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested by Eric)?
>
> We actually use rcu_dereference() in ip6mr_cache_report() only. The only user of dereference
> is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in inet_sock_destruct().
+ So this change should be safe.
> Thanks,
> Kirill
>
Powered by blists - more mailing lists