netdev - Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <30113998-dd9a-b104-8953-d32a42770343@gmail.com>
Date:   Thu, 25 Jun 2020 10:02:43 +0200
From:   Oliver Herms <oliver.peter.herms@...il.com>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        Michal Kubecek <mkubecek@...e.cz>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, kuznet@....inr.ac.ru,
        yoshfuji@...ux-ipv6.org, kuba@...nel.org
Subject: Re: [PATCH] IPv6: Fix CPU contention on FIB6 GC

On 24.06.20 18:42, Eric Dumazet wrote:
> 
> Our intention is to get rid of the IPv6 garbage collection, so we need to make sure
> we do not rely on it ;)

Hi Eric,

I can't really follow. Did you mean to get rid of the GC or the route cache?
And what is the plan? Separate structures for routes, ND and MTU exceptions as with IPv4?

However, I have a few questions regarding the FIB6.
After a few days of crawling through all this code, to me it looks like there is three types
of objects in the FIB6: Regular routes, neighbor discovery entries, MTU exceptions.
The latter two should then be subject to GC?

And regarding calling the GC from dst_alloc(): What is dst_entries_get_fast(ops) there
supposed to return? Number of entries in the FIB? Only MTU exceptions? MTU exceptions and
ND entries? 

> 
>>
>> Regarding your argument making the limit "softer":
>> There is only 2 functions that use/lock fib6_gc_lock:
>> 1. fib6_net_init
>> 2. fib6_run_gc 
>>
>> fib6_net_init is only called when the network namespace is initialized.
>>
>> fib6_run_gc clears all entries that are subject to garbage collection.
>> There is no gain in doing that N times (with N = amount of CPUs) and spinlocking all CPUs. 
>> Cleaning once is enough. A GC run is so short, that by the time the GC run is finished, 
>> there's most probably no new entry in the cache that is ready to be removed.
>> And even if there is. The next call to ip6_dst_gc will happen when a new entry
>> is added to the cache. Thus I can't see how my patch makes time limit softer.
>>
> 
> This are really minor implementation details, we need to find the root cause.
> 

Which root cause do you mean? For the entries to exist? Sure. Done and solved
(see my other patch). 

However, I think it still makes sense to make sure the GC can not block all CPUs (ever)
as it just never makes any sense. There is no real gain but the huge risk of heavy CPU
lock contention (freezing multiple CPUs for substantial amounts of time) when things go wrong. 
And they do go wrong (Murphy). 

There is two risks we have to look at:
1. The risk of heavy CPU lockups in softirq for substantial amounts of time (and the more CPUs
the machine has, the worse it gets)
2. The risk of temporarily consuming a little more RAM.

To me option 2 is much better as locked up CPUs are definitely killing production services (always),
while using a little more RAM can eventually cause the same result, it most likely won't as most likely 
there will be a few kb/mb of free RAM to store a few unneeded entries for a few microseconds.

I'm curious what reason there could be to stick to option 1.

Kind Regards
Oliver