netdev - VRFs and the scalability of namespaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <5425EAA6.7040302@gmail.com>
Date:	Fri, 26 Sep 2014 16:37:26 -0600
From:	David Ahern <dsahern@...il.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
CC:	nicolas.dichtel@...nd.com,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: VRFs and the scalability of namespaces

Hi Eric:

As you suggested [1] I am starting a new thread to discuss scalability 
problems using namespaces for VRFs.

Background
----------
Consider a single system that wants to provide VRF-based features with 
support for N VRFs. N could easily be 2048 (e.g., 6Wind, [2]), 4000 
(e.g., Cisco, [3]) or even higher.

The single system with support for N VRFs runs M services (e.g., quagga, 
cdp, lldp, stp, strongswan, some homegrown routing protocol) and 
includes standard system services like sshd. Furthermore, a system also 
includes monitoring programs like snmpd and tcollector. In short, M is 
easily 20 processes that need to have a presence across all VRFs.

Network Namespaces for VRFs
---------------------------
For the past 4 years or so the response to VRF questions is a drum beat 
of "use network namespaces". But namespaces are not a good match for VRFs.

1. Network namespaces are a complete separation of the networking stack 
from network devices up. VRFs are an L3 concept. Using namespaces forces 
an L3 separation concept onto L2 apps -- lldp, cdp, etc.

There are use cases when you want device level separation, use cases 
where you want only L3 and up separation, and cases where you want both 
(e.g., divy up the netdevices in a system across some small number of 
namespaces and then provide VRF based features within a namespace).

2. Scalability of apps providing service as namespaces are created. How 
do you create the presence for each service in a network namespace?

a. Spawn a new process for each namespace? brute force approach and 
extremely resource intensive. e.g., the quagga example [4]

b. spawn a thread for each namespace? Better than a full process but 
still a heavyweight solution

c. create a socket per namespace. Better but still this is a resource 
intensive solution -- N listen sockets per service and each service 
needs to be modified for namespace support. For opensource software that 
means each project has to agree that namespace awareness is relevant and 
agree to take the patches.

3. Just creating a network namespace consumes non-negligible amount of 
memory -- ~200kB for the 3.10 kernel. I believe the /proc entries are 
the bulk of that memory usage. 200kB/namespace is again a lot of wasted 
memory and overhead.

4. For a single process to straddle multiple namespaces it has to run 
with full root privileges -- CAP_SYS_ADMIN -- to use setns. Using 
network sockets does not require a process to run as root at all unless 
it wants privileged ports in which case CAP_NET_BIND_SERVICE is 
sufficient, not full root.

The Linux kernel needs proper VRF support -- as an L3 concept. A 
capability to run a process in a "VRF any" context provides a resource 
efficient solution where a single process with a single listen socket 
works across all VRFs in a namespace and then connected sockets have a 
specific VRF context.

Before droning on even more, does the above provide better context on 
the general problem?

Thanks,
David

[1] https://lkml.org/lkml/2014/9/26/840

[2] http://www.6wind.com/6windgate-performance/ip-forwarding

[3] 
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/sw/verified_scalability/b_Cisco_Nexus_7000_Series_NX-OS_Verified_Scalability_Guide.html

[4] 
https://lists.quagga.net/pipermail/quagga-users/2010-February/011351.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html