[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c739f928-86a2-46f8-b92e-86366758bb82@orange.com>
Date: Tue, 24 Sep 2024 16:06:36 +0200
From: Alexandre Ferrieux <alexandre.ferrieux@...il.com>
To: Eric Dumazet <edumazet@...gle.com>,
Alexandre Ferrieux <alexandre.ferrieux@...il.com>
Cc: Simon Horman <horms@...nel.org>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>, netdev@...r.kernel.org
Subject: Massive hash collisions on FIB
On 17/09/2024 08:59, Eric Dumazet wrote:
>
>> What do you think ?
>
> I do not see any blocker for making things more scalable.
>
> It is only a matter of time and interest. I think that 99.99 % of
> linux hosts around the world
> have less than 10 netns.
>
> RTNL removal is a little bit harder (and we hit RTNL contention even
> with less than 10 netns around)
Given this encouragement, I'm proceeding towards the the "million-tunnel baby".
And knowing where the current road bumps are, workarounds are possible: instead
of a direct 1M fanout of (netns+interface), I'm doing 10k netns with 100
interfaces each, which works like a charm.
But doing this I met an entirely new kind of bottleneck: the single FIB
hashtable, shared by all netns, lends itself to massive collision if many netns
contain the same local address.
Indeed, in this situation, the fib_inetaddr_notifier ends up inserting a local
route for the address, and the only "moving part" in the hash input is the
address itself.
As an example, after creating 7000 veth pairs and moving their "right half" to
7000 namespaces, an "ip addr add 192.168.1.2/32 dev $D" on one of them hits a
bucket of depth 7000.
To solve this, I'd naively inject a few bits of entropy from the netns itself
(inode number, middle bits of (struct net *) address, etc.), by XORing them to
the hash value. Unless I'm mistaken, the netns is always unambiguous when a FIB
decision is made, be it for a packet or for some interface configuration task.
Would that be acceptable ?
Powered by blists - more mailing lists