netdev - Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180901013427.tj3t2mlik4t7hlt5@gmail.com>
Date:   Sat, 1 Sep 2018 03:34:27 +0200
From:   Christian Brauner <christian.brauner@...onical.com>
To:     Kirill Tkhai <ktkhai@...tuozzo.com>
Cc:     netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        davem@...emloft.net, kuznet@....inr.ac.ru, yoshfuji@...ux-ipv6.org,
        pombredanne@...b.com, kstewart@...uxfoundation.org,
        gregkh@...uxfoundation.org, dsahern@...il.com, fw@...len.de,
        lucien.xin@...il.com, jakub.kicinski@...ronome.com,
        jbenc@...hat.com, nicolas.dichtel@...nd.com
Subject: Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for
 RTM_GETADDR

On Thu, Aug 30, 2018 at 04:45:45PM +0200, Christian Brauner wrote:
> On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
> > On 29.08.2018 21:13, Christian Brauner wrote:
> > > Hi Kirill,
> > > 
> > > Thanks for the question!
> > > 
> > > On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> > >> Hi, Christian,
> > >>
> > >> On 29.08.2018 02:18, Christian Brauner wrote:
> > >>> From: Christian Brauner <christian@...uner.io>
> > >>>
> > >>> Hey,
> > >>>
> > >>> A while back we introduced and enabled IFLA_IF_NETNSID in
> > >>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> > >>> to signficant performance increases since it allows userspace to avoid
> > >>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> > >>> interfaces from the netns associated with the netns_fd. Especially when a
> > >>> lot of network namespaces are in use, using setns() becomes increasingly
> > >>> problematic when performance matters.
> > >>
> > >> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> > >> problems with the performance? You should do this only once on application
> > >> startup, and then you have created netlink sockets in any net namespaces you
> > >> need. What is the problem here?
> > > 
> > > So we have a daemon (LXD) that is often running thousands of containers.
> > > When users issue a lxc list request against the daemon it returns a list
> > > of all containers including all of the interfaces and addresses for each
> > > container. To retrieve those addresses we currently rely on setns() +
> > > getifaddrs() for each of those containers. That has horrible
> > > performance.
> > 
> > Could you please provide some numbers showing that setns()
> > introduces signify performance decrease in the application?
> 
> Sure, might take a few days++ though since I'm traveling.

Hey Kirill,

As promised here's some code [1] that compares performance. I basically
did a setns() to the network namespace and called getifaddrs() and
compared this to the scenario where I use the newly introduced property.
I did this 1 million times and calculated the mean getifaddrs()
retrieval time based on that.
My patch cuts the time in half.

brauner@...tgenstein:~/netns_getifaddrs$ ./getifaddrs_perf 0 1178
Mean time in microseconds (netnsid): 81
Mean time in microseconds (setns): 162

Christian

I'm only appending the main file since the netsns_getifaddrs() code I
used is pretty long:

[1]:

#define _GNU_SOURCE
#define __STDC_FORMAT_MACROS
#include <fcntl.h>
#include <inttypes.h>
#include <linux/types.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>

#include "netns_getifaddrs.h"
#include "print_getifaddrs.h"

#define ITERATIONS 1000000
#define SEC_TO_MICROSEC(x) (1000000 * (x))

int main(int argc, char *argv[])
{
	int i, ret;
	__s32 netns_id;
	pid_t netns_pid;
	char path[1024];
	intmax_t times[ITERATIONS];
	struct timeval t1, t2;
	intmax_t time_in_mcs;
	int fret = EXIT_FAILURE;
	intmax_t sum = 0;
	int host_netns_fd = -1, netns_fd = -1;

	struct ifaddrs *ifaddrs = NULL;

	if (argc != 3)
		goto on_error;

	netns_id = atoi(argv[1]);
	netns_pid = atoi(argv[2]);
	printf("%d\n", netns_id);
	printf("%d\n", netns_pid);

	for (i = 0; i < ITERATIONS; i++) {
		ret = gettimeofday(&t1, NULL);
		if (ret < 0)
			goto on_error;

		ret = netns_getifaddrs(&ifaddrs, netns_id);
		freeifaddrs(ifaddrs);
		if (ret < 0)
			goto on_error;

		ret = gettimeofday(&t2, NULL);
		if (ret < 0)
			goto on_error;

		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
		times[i] = time_in_mcs;
	}

	for (i = 0; i < ITERATIONS; i++)
		sum += times[i];

	printf("Mean time in microseconds (netnsid): %ju\n",
	       sum / ITERATIONS);

	ret = snprintf(path, sizeof(path), "/proc/%d/ns/net", netns_pid);
	if (ret < 0 || (size_t)ret >= sizeof(path))
		goto on_error;

	netns_fd = open(path, O_RDONLY | O_CLOEXEC);
	if (netns_fd < 0)
		goto on_error;

	host_netns_fd = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC);
	if (host_netns_fd < 0)
		goto on_error;

	memset(times, 0, sizeof(times));
	for (i = 0; i < ITERATIONS; i++) {
		ret = gettimeofday(&t1, NULL);
		if (ret < 0)
			goto on_error;

		ret = setns(netns_fd, CLONE_NEWNET);
		if (ret < 0)
			goto on_error;

		ret = getifaddrs(&ifaddrs);
		freeifaddrs(ifaddrs);
		if (ret < 0)
			goto on_error;

		ret = gettimeofday(&t2, NULL);
		if (ret < 0)
			goto on_error;

		ret = setns(host_netns_fd, CLONE_NEWNET);
		if (ret < 0)
			goto on_error;

		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
		times[i] = time_in_mcs;
	}

	for (i = 0; i < ITERATIONS; i++)
		sum += times[i];

	printf("Mean time in microseconds (setns): %ju\n",
	       sum / ITERATIONS);

	fret = EXIT_SUCCESS;

on_error:
	if (netns_fd >= 0)
		close(netns_fd);

	if (host_netns_fd >= 0)
		close(host_netns_fd);

	exit(fret);
}

> 
> > 
> > I worry about all this just because of netlink interface is
> > already overloaded, while this patch makes it even less modular.
> 
> Introducing the IFA_IF_NETNSID property will not make the netlink
> interface less modular. It is a clean, RTM_*ADDR-request specific
> property using network namespace identifiers which we discussed in prior
> patches are the way to go forward.
> 
> You can already get interfaces via GETLINK from another network
> namespaces than the one you reside in (Which we enabled just a few
> months back.) but you can't do the same for GETADDR. Those two are
> almost always used together. When you want to get the links you usually
> also want to get the addresses associated with it right after.
> In a prior discussion we agreed that network namespace identifiers are
> the way to go forward but that any other propery, i.e. PIDs and fds
> should never be ported into other parts of the codebase and that is
> indeed something I agree with.
> 
> > In case of one day we finally reach rtnl unscalability trap,
> > every common interface like this may be a decisive nail in
> > a coffin lid of possibility to overwrite everything.
> > 
> > > The problem with what you're proposing is that the daemon would need to
> > > cache a socket file descriptor for each container which is something
> > > that we unfortunately cannot do since we can't excessively cache file
> > > descriptors because we can easily hit the open file limit. We also
> > > refrain from caching file descriptors for a long time for security
> > > reasons.
> > > 
> > > For the case where users just request a list of the interfaces we
> > > can already use RTM_GETLINK + IFLA_IF_NETNS which has way better
> > > performance. But we can't do the same with RTM_GETADDR requests which
> > > was an oversight on my part when I wrote the original patchset for the
> > > RTM_*LINK requests. This just rectifies this and aligns RTM_GETLINK +
> > > RTM_GETADDR.
> > > Based on this patchset I have written a userspace POC that is basically
> > > a netns namespace aware getifaddr() or - as I like to call it -
> > > netns_getifaddr().
> > > 
> > >>
> > >>> Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf.
> > >>> getifaddrs() style functions and friends). But currently, RTM_GETADDR
> > >>> requests do not support a similar property like IFLA_IF_NETNSID for
> > >>> RTM_*LINK requests.
> > >>> This is problematic since userspace can retrieve interfaces from another
> > >>> network namespace by sending a IFLA_IF_NETNSID property along but
> > >>> RTM_GETLINK request but is still forced to use the legacy setns() style of
> > >>> retrieving interfaces in RTM_GETADDR requests.
> > >>>
> > >>> The goal of this series is to make it possible to perform RTM_GETADDR
> > >>> requests on different network namespaces. To this end a new IFA_IF_NETNSID
> > >>> property for RTM_*ADDR requests is introduced. It can be used to send a
> > >>> network namespace identifier along in RTM_*ADDR requests.  The network
> > >>> namespace identifier will be used to retrieve the target network namespace
> > >>> in which the request is supposed to be fulfilled.  This aligns the behavior
> > >>> of RTM_*ADDR requests with the behavior of RTM_*LINK requests.
> > >>>
> > >>> Security:
> > >>> - The caller must have assigned a valid network namespace identifier for
> > >>>   the target network namespace.
> > >>> - The caller must have CAP_NET_ADMIN in the owning user namespace of the
> > >>>   target network namespace.
> > >>>
> > >>> Thanks!
> > >>> Christian
> > >>>
> > >>> [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID")
> > >>> [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK")
> > >>> [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK")
> > >>> [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK")
> > >>> [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()")
> > >>>
> > >>> Christian Brauner (5):
> > >>>   rtnetlink: add rtnl_get_net_ns_capable()
> > >>>   if_addr: add IFA_IF_NETNSID
> > >>>   ipv4: enable IFA_IF_NETNSID for RTM_GETADDR
> > >>>   ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
> > >>>   rtnetlink: move type calculation out of loop
> > >>>
> > >>>  include/net/rtnetlink.h      |  1 +
> > >>>  include/uapi/linux/if_addr.h |  1 +
> > >>>  net/core/rtnetlink.c         | 15 +++++---
> > >>>  net/ipv4/devinet.c           | 38 +++++++++++++++-----
> > >>>  net/ipv6/addrconf.c          | 70 ++++++++++++++++++++++++++++--------
> > >>>  5 files changed, 97 insertions(+), 28 deletions(-)
> > >>>