netdev - Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3230b95a-1ce0-b569-3d00-f7063ae9f1d9@gmail.com>
Date:   Tue, 23 Jun 2020 01:30:29 +0200
From:   Oliver Herms <oliver.peter.herms@...il.com>
To:     Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org
Cc:     davem@...emloft.net, kuznet@....inr.ac.ru, yoshfuji@...ux-ipv6.org,
        kuba@...nel.org
Subject: Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC

On 23.06.20 00:55, Eric Dumazet wrote:
> 
> 
> On 6/22/20 1:53 PM, Oliver Herms wrote:
>> When fib6_run_gc is called with parameter force=true the spinlock in
>> /net/ipv6/ip6_fib.c:2310 can lock all CPUs in softirq when
>> net.ipv6.route.max_size is exceeded (seen this multiple times).
>> One sotirq/CPU get's the lock. All others spin to get it. It takes
>> substantial time until all are done. Effectively it's a DOS vector.
>>
>> As the splinlock is only enforcing that there is at most one GC running
>> at a time, it should IMHO be safe to use force=false here resulting
>> in spin_trylock_bh instead of spin_lock_bh, thus avoiding the lock
>> contention.
>>
>> Finding a locked spinlock means some GC is going on already so it is
>> save to just skip another execution of the GC.
>>
>> Signed-off-by: Oliver Herms <oliver.peter.herms@...il.com>
>> ---
>>  net/ipv6/route.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> index 82cbb46a2a4f..7e6fbaf43549 100644
>> --- a/net/ipv6/route.c
>> +++ b/net/ipv6/route.c
>> @@ -3205,7 +3205,7 @@ static int ip6_dst_gc(struct dst_ops *ops)
>>  		goto out;
>>  
>>  	net->ipv6.ip6_rt_gc_expire++;
>> -	fib6_run_gc(net->ipv6.ip6_rt_gc_expire, net, true);
>> +	fib6_run_gc(net->ipv6.ip6_rt_gc_expire, net, false);
>>  	entries = dst_entries_get_slow(ops);
>>  	if (entries < ops->gc_thresh)
>>  		net->ipv6.ip6_rt_gc_expire = rt_gc_timeout>>1;
> 
> 
Hi Eric,

> On which kernel have you seen a contention ?
I've freshly checked out from here:
staging-testing@git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git 

Reproduced my issue I've encountered with 4.15 (from Ubuntu) in prod, 
applied the patch, checked that it solves my problem.

I'm encountering the issues due to cache entries that are created by 
tnl_update_pmtu. However, I'm going to address that issue in another thread
and patch.

As entries in the cache can be caused on many ways this should be fixed on the GC
level.

> 
> I am asking this because I recently pushed a patch that basically should have
> been enough to take care of the problem.
> 
> commit d8882935fcae28bceb5f6f56f09cded8d36d85e6
> Author: Eric Dumazet <edumazet@...gle.com>
> Date:   Fri May 8 07:34:14 2020 -0700
> 
>     ipv6: use DST_NOCOUNT in ip6_rt_pcpu_alloc()
> 
I've checked: Your patch was in when I tested in the lab today.
as well.

Kind Regards
Oliver