[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5be2063d-2433-54af-194d-fd4628974f29@gmail.com>
Date: Wed, 24 Jun 2020 12:34:50 +0200
From: Oliver Herms <oliver.peter.herms@...il.com>
To: Michal Kubecek <mkubecek@...e.cz>
Cc: Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org,
davem@...emloft.net, kuznet@....inr.ac.ru, yoshfuji@...ux-ipv6.org,
kuba@...nel.org
Subject: Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC
On 24.06.20 00:06, Michal Kubecek wrote:
> On Tue, Jun 23, 2020 at 01:30:29AM +0200, Oliver Herms wrote:
>>
>> I'm encountering the issues due to cache entries that are created by
>> tnl_update_pmtu. However, I'm going to address that issue in another thread
>> and patch.
>>
>> As entries in the cache can be caused on many ways this should be fixed on the GC
>> level.
>
> Actually, not so many as starting with (IIRC) 4.2, IPv6 routing cache is
> only used for exceptions (like PMTU), not for regular lookup results.
>
Hi Michal,
Right. That is the intention. But reality is, that when sending IPv6 with an
MPLS encap route into a SIT/FOU tunnel, a cache entry is being created for every single
destination on the tunnel. Now that IS a bug by itself and I'll shortly submit a
patch that should fix that issue.
However, when a tunnel uses PMTU, and a tunnel source received an ICMP packet too big
for the tunnel destination, that triggers creation of IPv6 route cache entries
(and for IPv4 entries in the corresponding data structure) for every destination for which
packets are sent through the tunnel.
Both these attributes,
1. the presence or absence of, maybe spoofed, ICMP packet too big messages for the tunnel
2. the number of flows through a tunnel (attackers could just create more flows)
are not fully under control by an operator.
Thus the assumption that if only max_size would be big enough, it would solve the problem,
it not correct.
Regarding your argument making the limit "softer":
There is only 2 functions that use/lock fib6_gc_lock:
1. fib6_net_init
2. fib6_run_gc
fib6_net_init is only called when the network namespace is initialized.
fib6_run_gc clears all entries that are subject to garbage collection.
There is no gain in doing that N times (with N = amount of CPUs) and spinlocking all CPUs.
Cleaning once is enough. A GC run is so short, that by the time the GC run is finished,
there's most probably no new entry in the cache that is ready to be removed.
And even if there is. The next call to ip6_dst_gc will happen when a new entry
is added to the cache. Thus I can't see how my patch makes time limit softer.
Kind Regards
Oliver
Powered by blists - more mailing lists