[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <71411e21-0efa-0e41-836d-0301b1c1dbcb@virtuozzo.com>
Date: Wed, 7 Mar 2018 12:22:08 +0300
From: Kirill Tkhai <ktkhai@...tuozzo.com>
To: Eric Dumazet <eric.dumazet@...il.com>, davem@...emloft.net,
yoshfuji@...ux-ipv6.org, netdev@...r.kernel.org,
Yuval Mintz <yuvalm@...lanox.com>
Subject: Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in
ip6mr_sk_done() only if this is needed
On 06.03.2018 19:50, Eric Dumazet wrote:
> On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote:
>> After unshare test kworker hangs for ages:
>>
>> $ while :; do unshare -n true; done &
>>
>> $ perf report <kworker>
>> - 88,82% 0,00% kworker/u16:0 [kernel.vmlinux] [k]
>> cleanup_net
>> cleanup_net
>> - ops_exit_list.isra.9
>> - 85,68% igmp6_net_exit
>> - 53,31% sock_release
>> - inet_release
>> - 25,33% rawv6_close
>> - ip6mr_sk_done
>> + 23,38% synchronize_rcu
>>
>> Keep in mind, this perf report shows the time a function was
>> executing, and
>> it does not show the time, it was sleeping. But it's easy to imagine,
>> how
>> much it is sleeping, if synchronize_rcu() execution takes the most
>> time.
>> Top shows the kworker R time is less then 1%.
>>
>> This happen, because of in ip6mr_sk_done() we do too many
>> synchronize_rcu(),
>> even for the sockets, that are not referenced in mr_table, and which
>> are not
>> need it. So, the progress of kworker becomes very slow.
>>
>> The patch introduces apparent solution, and it makes ip6mr_sk_done()
>> to skip
>> synchronize_rcu() for sockets, that are not need that. After the
>> patch,
>> kworker becomes able to warm the cpu up as expected.
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@...tuozzo.com>
>> ---
>> net/ipv6/ip6mr.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
>> index 2a38f9de45d3..290a8d0d5eac 100644
>> --- a/net/ipv6/ip6mr.c
>> +++ b/net/ipv6/ip6mr.c
>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
>> }
>> }
>> rtnl_unlock();
>> - synchronize_rcu();
>> +
>> + if (!err)
>> + synchronize_rcu();
>>
>
>
> But... what is this synchronize_rcu() doing exactly ?
>
> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
>
> ("ip6mr: Make mroute_sk rcu-based")
>
> Typically on a delete, the synchronize_rcu() would be needed before
> freeing the deleted object.
>
> But nowadays we have better way : SOCK_RCU_FREE
Hm. I'm agree with you. This is hot path, and mroute sockets created from userspace
will delay userspace tasks close() and exit(). Since there may be many such sockets,
we may get a zombie task, which can't be reaped for ages. This slows down the system
directly.
Fix for pernet_operations works, but we need generic solution instead.
The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says:
ip6mr: Make mroute_sk rcu-based
In ipmr the mr_table socket is handled under RCU. Introduce the same
for ip6mr.
There is no pointing to improvements it invents, or to the problem it solves. The description
looks like a cleanup. It's too expensive cleanup, if it worsens the performance a hundred
times.
Can't we simply revert it?!
Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested by Eric)?
We actually use rcu_dereference() in ip6mr_cache_report() only. The only user of dereference
is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in inet_sock_destruct().
Thanks,
Kirill
Powered by blists - more mailing lists