lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3230b95a-1ce0-b569-3d00-f7063ae9f1d9@gmail.com>
Date:   Tue, 23 Jun 2020 01:30:29 +0200
From:   Oliver Herms <oliver.peter.herms@...il.com>
To:     Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org
Cc:     davem@...emloft.net, kuznet@....inr.ac.ru, yoshfuji@...ux-ipv6.org,
        kuba@...nel.org
Subject: Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC

On 23.06.20 00:55, Eric Dumazet wrote:
> 
> 
> On 6/22/20 1:53 PM, Oliver Herms wrote:
>> When fib6_run_gc is called with parameter force=true the spinlock in
>> /net/ipv6/ip6_fib.c:2310 can lock all CPUs in softirq when
>> net.ipv6.route.max_size is exceeded (seen this multiple times).
>> One sotirq/CPU get's the lock. All others spin to get it. It takes
>> substantial time until all are done. Effectively it's a DOS vector.
>>
>> As the splinlock is only enforcing that there is at most one GC running
>> at a time, it should IMHO be safe to use force=false here resulting
>> in spin_trylock_bh instead of spin_lock_bh, thus avoiding the lock
>> contention.
>>
>> Finding a locked spinlock means some GC is going on already so it is
>> save to just skip another execution of the GC.
>>
>> Signed-off-by: Oliver Herms <oliver.peter.herms@...il.com>
>> ---
>>  net/ipv6/route.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 82cbb46a2a4f..7e6fbaf43549 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -3205,7 +3205,7 @@ static int ip6_dst_gc(struct dst_ops *ops)
>>  		goto out;
>>  
>>  	net->ipv6.ip6_rt_gc_expire++;
>> -	fib6_run_gc(net->ipv6.ip6_rt_gc_expire, net, true);
>> +	fib6_run_gc(net->ipv6.ip6_rt_gc_expire, net, false);
>>  	entries = dst_entries_get_slow(ops);
>>  	if (entries < ops->gc_thresh)
>>  		net->ipv6.ip6_rt_gc_expire = rt_gc_timeout>>1;
> 
> 
Hi Eric,

> On which kernel have you seen a contention ?
I've freshly checked out from here:
staging-testing@git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git 

Reproduced my issue I've encountered with 4.15 (from Ubuntu) in prod, 
applied the patch, checked that it solves my problem.

I'm encountering the issues due to cache entries that are created by 
tnl_update_pmtu. However, I'm going to address that issue in another thread
and patch.

As entries in the cache can be caused on many ways this should be fixed on the GC
level.

> 
> I am asking this because I recently pushed a patch that basically should have
> been enough to take care of the problem.
> 
> commit d8882935fcae28bceb5f6f56f09cded8d36d85e6
> Author: Eric Dumazet <edumazet@...gle.com>
> Date:   Fri May 8 07:34:14 2020 -0700
> 
>     ipv6: use DST_NOCOUNT in ip6_rt_pcpu_alloc()
> 
I've checked: Your patch was in when I tested in the lab today.
as well.

Kind Regards
Oliver

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ