netdev - Re: state of rtcache removal...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1297923915.2645.24.camel@edumazet-laptop>
Date:	Thu, 17 Feb 2011 07:25:15 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	netdev@...r.kernel.org
Subject: Re: state of rtcache removal...

Le mercredi 16 février 2011 à 16:08 -0800, David Miller a écrit :
> So I've been testing out the routing cache removal patch to see
> what the impact is on performance.
> 
> I'm using a UDP flood to a single IP address over a dummy interface
> with hard coded ARP entries, so that pretty much just the main IP
> output and routing paths are being exercised.
> 
> The UDP flood tool I cooked up based upon a description sent to me by
> Eric Dumazet of a similar utility he uses for testing.  I've included
> the code to this tool at the end of this email, as well as the dummy
> interface setup script.   Basically, you go:
> 
> bash# ./udpflood_setup.sh
> bash# time ./udpflood -l 10000 10.2.2.11
> 
> The IP output path is about twice as slow with the routing cache
> removed entirely.  Here are the numbers I have:
> 
> net-next-2.6, rt_cache on:
> 
> davem@...amba:~$ time udpflood -l 10000000 10.2.2.11
> real		 1m47.012s
> user		 0m8.670s
> sys		 1m38.370s
> 
> net-next-2.6, rt_cache turned off via sysctl:
> 
> davem@...amba:~$ time udpflood -l 10000000 10.2.2.11
> real		 3m12.662s
> user		 0m9.490s
> sys		 3m3.220s
> 
> net-next-2.6 + "BONUS" rt_cache deletion patch:
> 
> maramba:/home/davem# time ./bin/udpflood -l 10000000 10.2.2.11
> real		     3m9.921s
> user		     0m9.520s
> sys		     3m0.440s
> 
> I then worked on some simplifications of the code in net/ipv4/route.c
> that remains after the cache removal.  I'll post those patches after
> I've chewed on them some more, but they knock a couple seconds back off
> of the benchmark:
> 
> The profile output is what you'd expect, with fib_table_lookup() topping
> the charts taking ~%10 of the time.
> 
> What might not be initially apparent is that each output route lookup
> results in two calls to fib_table_lookup() and thus two trie lookups.
> Why?  Because we have two routing tables (3 with IP_MULTIPLE_TABLES
> enabled) that get searched, first the LOCAL then the MAIN table (then
> with mutliple-tables enabled, the DEFAULT).  And most external
> outgoing routes sit in the MAIN table.
> 
> We do this so we can store all the interface address network,
> broadcast, loopback network, et al. routes in the LOCAL table, then all
> globally visible routes in the MAIN table.
> 
> Anyways, the long and short of this is that route lookups take two
> trie lookups instead of just one.  On input there are even more, for
> source address validation done by fib_validate_source().  That can be
> up to 4 more fib_table_lookup() invocations.
> 
> Add in another level of complexity if you have a series of FIB rules
> installed.
> 
> So, to me, this means that spending time micro-optiming fib_trie is
> not going to help much.  Getting rid of that multiplier somehow, on
> the other hand, might.
> 
> I plan to play with some ideas, such as sticking fib_alias entries into
> the flow cache and consulting/populating the flow cache on fib_lookup()
> calls.
> 
> -------------------- udpflood.c --------------------
> /* An adaptation of Eric Dumazet's udpflood tool.  */
> 
> #include <stdio.h>
> #include <stddef.h>
> #include <malloc.h>
> #include <string.h>
> #include <errno.h>
> 
> #include <sys/types.h>
> #include <sys/socket.h>
> #include <netinet/in.h>
> #include <arpa/inet.h>
> 
> #define _GNU_SOURCE
> #include <getopt.h>
> 
> static int usage(void)
> {
> 	printf("usage: udpflood [ -l count ] [ -m message_size ] IP_ADDRESS\n");
> 	return -1;
> }
> 
> static int send_packets(in_addr_t addr, int port, int count, int msg_sz)
> {
> 	char *msg = malloc(msg_sz);
> 	struct sockaddr_in saddr;
> 	int fd, i, err;
> 
> 	if (!msg)
> 		return -ENOMEM;
> 
> 	memset(msg, 0, msg_sz);
> 
> 	memset(&saddr, 0, sizeof(saddr));
> 	saddr.sin_family = AF_INET;
> 	saddr.sin_port = port;
> 	saddr.sin_addr.s_addr = addr;
> 
> 	fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
> 	if (fd < 0) {
> 		perror("socket");
> 		err = fd;
> 		goto out_nofd;
> 	}
> 	err = connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
> 	if (err < 0) {
> 		perror("connect");
> 		close(fd);
> 		goto out;
> 	}
> 	for (i = 0; i < count; i++) {
> 		err = sendto(fd, msg, msg_sz, 0,
> 			     (struct sockaddr *) &saddr, sizeof(saddr));
> 		if (err < 0) {
> 			perror("sendto");
> 			goto out;
> 		}
> 	}
> 
> 	err = 0;
> out:
> 	close(fd);
> out_nofd:
> 	free(msg);
> 	return err;
> }
> 
> int main(int argc, char **argv, char **envp)
> {
> 	int port, msg_sz, count, ret;
> 	in_addr_t addr;
> 
> 	port = 6000;
> 	msg_sz = 32;
> 	count = 10000000;
> 
> 	while ((ret = getopt(argc, argv, "l:s:p:")) >= 0) {
> 		switch (ret) {
> 		case 'l':
> 			sscanf(optarg, "%d", &count);
> 			break;
> 		case 's':
> 			sscanf(optarg, "%d", &msg_sz);
> 			break;
> 		case 'p':
> 			sscanf(optarg, "%d", &port);
> 			break;
> 		case '?':
> 			return usage();
> 		}
> 	}
> 
> 	if (!argv[optind])
> 		return usage();
> 
> 	addr = inet_addr(argv[optind]);
> 	if (addr == INADDR_NONE)
> 		return usage();
> 
> 	return send_packets(addr, port, count, msg_sz);
> }
> 
> -------------------- udpflood_setup.sh --------------------
> #!/bin/sh
> modprobe dummy
> ifconfig dummy0 10.2.2.254 netmask 255.255.255.0 up
> 
> for f in $(seq 11 26)
> do
>  arp -H ether -i dummy0 -s 10.2.2.$f 00:00:0c:07:ac:$f
> done
> --

Thanks David for this work in progress.

If I remember my works in last October/November, I also know fib_hash
was a bit faster than fib_trie (around 20%)...




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html