netdev - Re: [RFC PATCH net-next v2 0/2] Mitigate the Issue of Expired Routes in Linux IPv6 Routing Tables

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1284f846-f8ed-d95a-4476-40e2de26c092@gmail.com>
Date: Thu, 18 May 2023 11:51:47 -0700
From: Kui-Feng Lee <sinquersw@...il.com>
To: David Ahern <dsahern@...nel.org>, Kui-Feng Lee <thinker.li@...il.com>,
 netdev@...r.kernel.org, ast@...nel.org, martin.lau@...ux.dev,
 kernel-team@...a.com, davem@...emloft.net, edumazet@...gle.com,
 kuba@...nel.org, pabeni@...hat.com
Cc: Kui-Feng Lee <kuifeng@...a.com>, Ido Schimmel <idosch@...sch.org>
Subject: Re: [RFC PATCH net-next v2 0/2] Mitigate the Issue of Expired Routes
 in Linux IPv6 Routing Tables



On 5/18/23 08:28, David Ahern wrote:
> On 5/17/23 11:40 PM, Kui-Feng Lee wrote:
>>
>>
>> On 5/17/23 20:21, David Ahern wrote:
>>> On 5/17/23 12:33 PM, Kui-Feng Lee wrote:
>>>> This RFC is resent to ensure maintainers getting awared.  Also remove
>>>> some forward declarations that we don't use anymore.
>>>>
>>>> The size of a Linux IPv6 routing table can become a big problem if not
>>>> managed appropriately.  Now, Linux has a garbage collector to remove
>>>> expired routes periodically.  However, this may lead to a situation in
>>>> which the routing path is blocked for a long period due to an
>>>> excessive number of routes.
>>>
>>> I take it this problem was seen internally to your org? Can you give
>>> representative numbers on how many routes, stats on the blocked time,
>>> and reason for the large time block (I am guessing the notifier)?
>>
>> We don't have existing incidents so far.  Consider it as
>> a potential issue.
> 
> So no data to compare how the system was operating before and after.

I can generate traffic to test it.

> 
> ...
> 
>>
>> In contrast, the current GC has to walk every tree even only one route
>> expired.
> 
> As I recall the largest overhead is systems (e.g., switchdev) handling
> the notifier. The tree walk scaling problem can be addressed with a much
> simpler change -- e.g., add a list_head per fib6_table for fib6_info
> entries that can expire and make the list time sorted. Then the gc only
> needs to walk those lists up to the expired point.

This is one of solutions I considered at beginning.
With this approach, we can have a maximum
number of entries like what neighbor tables do.
Remove entries only if the list reach the maximum without running
a GC timer.  However, it can be very inefficient to insert a new entry
ordered. Stephen mentioned 3 million routes on backbone router
in another message.  We may need something more complicated
like RB-tree or HEAP to reduce the overhead.