[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <04671504-5ee4-7cfa-7b28-9e7f15c9f607@gmail.com>
Date: Wed, 24 Jun 2020 09:42:23 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Oliver Herms <oliver.peter.herms@...il.com>,
Michal Kubecek <mkubecek@...e.cz>
Cc: Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org,
davem@...emloft.net, kuznet@....inr.ac.ru, yoshfuji@...ux-ipv6.org,
kuba@...nel.org
Subject: Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC
On 6/24/20 3:34 AM, Oliver Herms wrote:
> On 24.06.20 00:06, Michal Kubecek wrote:
>> On Tue, Jun 23, 2020 at 01:30:29AM +0200, Oliver Herms wrote:
>>>
>>> I'm encountering the issues due to cache entries that are created by
>>> tnl_update_pmtu. However, I'm going to address that issue in another thread
>>> and patch.
>>>
>>> As entries in the cache can be caused on many ways this should be fixed on the GC
>>> level.
>>
>> Actually, not so many as starting with (IIRC) 4.2, IPv6 routing cache is
>> only used for exceptions (like PMTU), not for regular lookup results.
>>
>
> Hi Michal,
>
> Right. That is the intention. But reality is, that when sending IPv6 with an
> MPLS encap route into a SIT/FOU tunnel, a cache entry is being created for every single
> destination on the tunnel. Now that IS a bug by itself and I'll shortly submit a
> patch that should fix that issue.
Thanks !
>
> However, when a tunnel uses PMTU, and a tunnel source received an ICMP packet too big
> for the tunnel destination, that triggers creation of IPv6 route cache entries
> (and for IPv4 entries in the corresponding data structure) for every destination for which
> packets are sent through the tunnel.
>
> Both these attributes,
> 1. the presence or absence of, maybe spoofed, ICMP packet too big messages for the tunnel
> 2. the number of flows through a tunnel (attackers could just create more flows)
> are not fully under control by an operator.
>
> Thus the assumption that if only max_size would be big enough, it would solve the problem,
> it not correct.
Our intention is to get rid of the IPv6 garbage collection, so we need to make sure
we do not rely on it ;)
>
> Regarding your argument making the limit "softer":
> There is only 2 functions that use/lock fib6_gc_lock:
> 1. fib6_net_init
> 2. fib6_run_gc
>
> fib6_net_init is only called when the network namespace is initialized.
>
> fib6_run_gc clears all entries that are subject to garbage collection.
> There is no gain in doing that N times (with N = amount of CPUs) and spinlocking all CPUs.
> Cleaning once is enough. A GC run is so short, that by the time the GC run is finished,
> there's most probably no new entry in the cache that is ready to be removed.
> And even if there is. The next call to ip6_dst_gc will happen when a new entry
> is added to the cache. Thus I can't see how my patch makes time limit softer.
>
This are really minor implementation details, we need to find the root cause.
Powered by blists - more mailing lists