netdev - Re: RFC: Should net namespaces scale up (>10k) ?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <db6ecdc4-8053-42d6-89cc-39c70b199bde@intel.com>
Date: Mon, 16 Sep 2024 12:13:35 +0200
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Alexandre Ferrieux <alexandre.ferrieux@...il.com>
CC: Alexandre Ferrieux <alexandre.ferrieux@...nge.com>, <horms@...nel.org>,
	Eric Dumazet <edumazet@...gle.com>, <netdev@...r.kernel.org>
Subject: Re: RFC: Should net namespaces scale up (>10k) ?

On 9/15/24 22:49, Alexandre Ferrieux wrote:
> (thanks Simon, reposting with another account to avoid the offending disclaimer)
> 
> Hi,
> 
> Currently, netns don't really scale beyond a few thousands, for
> mundane reasons (see below). But should they ? Is there, in the
> design, an assumption that tens of thousands of network namespaces are
> considered "unreasonable" ?
> 
> A typical use case for such ridiculous numbers is a tester for
> firewalls or carrier-grade NATs. In these, you typically want tens of
> thousands of tunnels, each of which is perfectly instantiated as an
> interface. And, to avoid an explosion in source routing rules, you
> want them in separate namespaces.
> 
> Now why don't they scale *today* ? For two independent, seemingly
> accidental, O(N) scans of the netns list.
> 
> 1. The "netdevice notifier" from the Wireless Extensions subsystem
> insists on scanning the whole list regardless of the nature of the
> change, nor wondering whether all these namespaces hold any wireless
> interface, nor even whether the system has _any_ wireless hardware...
> 
>          for_each_net(net) {
>                  while ((skb = skb_dequeue(&net->wext_nlevents)))
>                          rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL,
>                                      GFP_KERNEL);
>          }
> 
> 2. When moving an interface (eg an IPVLAN slave) to another netns,
> __dev_change_net_namespace() calls peernet2id_alloc() in order to get
> an ID for the target namespace. This again incurs a full scan of the
> netns list:
> 
>          int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);

this piece is inside of __peernet2id(), which is called in for_each_net
loop, making it O(n^2):

  548│         for_each_net(tmp) {
  549│                 int id;
  550│
  551│                 spin_lock_bh(&tmp->nsid_lock);
  552│                 id = __peernet2id(tmp, net);

> 
> Note that, while IDR is very fast when going from ID to pointer, the
> reverse path is awfully slow... But why are IDs needed in the first
> place, instead of the simple netns pointers ?
> 
> Any insight on the (possibly very good) reasons those two apparent
> warts stand in the way of netns scaling up ?
> 
> -Alex
> 

I guess that the reason is more pragmatic, net namespaces are decade
older than xarray, thus list-based implementation.