lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Mar 2018 12:22:08 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     Eric Dumazet <eric.dumazet@...il.com>, davem@...emloft.net,
        yoshfuji@...ux-ipv6.org, netdev@...r.kernel.org,
        Yuval Mintz <yuvalm@...lanox.com>
Subject: Re: [PATCH RESEND net-next] net: Do synchronize_rcu() in
 ip6mr_sk_done() only if this is needed

On 06.03.2018 19:50, Eric Dumazet wrote:
> On Tue, 2018-03-06 at 19:24 +0300, Kirill Tkhai wrote:
>> After unshare test kworker hangs for ages:
>>
>>     $ while :; do unshare -n true; done &
>>
>>     $ perf report <kworker>
>>     -   88,82%     0,00%  kworker/u16:0  [kernel.vmlinux]  [k]
>> cleanup_net
>>          cleanup_net
>>        - ops_exit_list.isra.9
>>           - 85,68% igmp6_net_exit
>>              - 53,31% sock_release
>>                 - inet_release
>>                    - 25,33% rawv6_close
>>                       - ip6mr_sk_done
>>                          + 23,38% synchronize_rcu
>>
>> Keep in mind, this perf report shows the time a function was
>> executing, and
>> it does not show the time, it was sleeping. But it's easy to imagine,
>> how
>> much it is sleeping, if synchronize_rcu() execution takes the most
>> time.
>> Top shows the kworker R time is less then 1%.
>>
>> This happen, because of in ip6mr_sk_done() we do too many
>> synchronize_rcu(),
>> even for the sockets, that are not referenced in mr_table, and which
>> are not
>> need it. So, the progress of kworker becomes very slow.
>>
>> The patch introduces apparent solution, and it makes ip6mr_sk_done()
>> to skip
>> synchronize_rcu() for sockets, that are not need that. After the
>> patch,
>> kworker becomes able to warm the cpu up as expected.
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@...tuozzo.com>
>> ---
>>  net/ipv6/ip6mr.c |    4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
>> index 2a38f9de45d3..290a8d0d5eac 100644
>> --- a/net/ipv6/ip6mr.c
>> +++ b/net/ipv6/ip6mr.c
>> @@ -1485,7 +1485,9 @@ int ip6mr_sk_done(struct sock *sk)
>>  		}
>>  	}
>>  	rtnl_unlock();
>> -	synchronize_rcu();
>> +
>> +	if (!err)
>> +		synchronize_rcu();
>>  
> 
> 
> But... what is this synchronize_rcu() doing exactly ?
> 
> This was added in 8571ab479a6e1ef46ead5ebee567e128a422767c
> 
> ("ip6mr: Make mroute_sk rcu-based")
> 
> Typically on a delete, the synchronize_rcu() would be needed before
> freeing the deleted object.
> 
> But nowadays we have better way : SOCK_RCU_FREE

Hm. I'm agree with you. This is hot path, and mroute sockets created from userspace
will delay userspace tasks close() and exit(). Since there may be many such sockets,
we may get a zombie task, which can't be reaped for ages. This slows down the system
directly.

Fix for pernet_operations works, but we need generic solution instead.

The commit "8571ab479a6e1ef46ead5ebee567e128a422767c" says:

    ip6mr: Make mroute_sk rcu-based
    
    In ipmr the mr_table socket is handled under RCU. Introduce the same
    for ip6mr.

There is no pointing to improvements it invents, or to the problem it solves. The description
looks like a cleanup. It's too expensive cleanup, if it worsens the performance a hundred
times.

Can't we simply revert it?!

Yuval, do you have ideas to fix that (maybe, via SOCK_RCU_FREE suggested by Eric)?

We actually use rcu_dereference() in ip6mr_cache_report() only. The only user of dereference
is sock_queue_rcv_skb(). Superficial analysis shows we purge the queue in inet_sock_destruct().

Thanks,
Kirill

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ