netdev - Re: [RFC][PATCH] iproute: Faster ip link add, set and delete

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87ip4b5813.fsf@xmission.com>
Date:	Thu, 28 Mar 2013 18:06:32 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Stephen Hemminger <stephen@...workplumber.org>,
	Benoit Lourdelet <blourdel@...iper.net>,
	Serge Hallyn <serge.hallyn@...ntu.com>,
	"netdev\@vger.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [RFC][PATCH] iproute: Faster ip link add, set and delete

Eric Dumazet <eric.dumazet@...il.com> writes:

> On Thu, 2013-03-28 at 17:25 -0700, Eric W. Biederman wrote:
>> Eric Dumazet <eric.dumazet@...il.com> writes:
>> 
>> > On Thu, 2013-03-28 at 16:52 -0700, Eric W. Biederman wrote:
>> >
>> >> On my microbenchmark of just creating 5000 veth pairs this takes pairs
>> >> 16s instead of 13s of my earlier hacks but that is well down in the
>> >> usable range.
>> >
>> > I guess most of the time is taken by sysctl_check_table()
>> 
>> All of the significant sysctl slowdowns were fixed in 3.4.  If you see
>> something of sysctl show up in a trace I would be happy to talk about
>> it.  The kernel side seems to be creating N network devices seems to
>> take NlogN time now.  Both sysfs and sysctl store directories as
>> rbtrees removing their previous bottlenecks.
>> 
>> The loop I timed at 16s was just:
>> 
>> time for i in $(seq 1 5000) ; do ip link add a$i type veth peer name b$i; done
>> 
>> There is plenty of room for inefficiencies in 10000 network devices and
>> 5000 forks+execs.
>
> Ah right, the sysctl part is fixed ;)
>
> In batch mode, I can create these veth pairs in 4 seconds
>
> for i in $(seq 1 5000) ; do echo link add a$i type veth peer name b$i;
> done | ip -batch -

Yes.  The interesting story here is that the bottleneck before these
patches was the ll_init_map function of iproute2.   Which resulted in an
over an order of magnitude slowdown of when starting iproute on a system
with lots of network devices.

It is still unclear where iproute comes into the picture in the original
problem scenario of creating 2000 containers each with 2 veth pairs.
But apparently it was.

As the fundamental use case here was taking 2000 separate independent
actions it turns out to be important for things to not slowdown
unreasonably outside of batch mode.  So I was explicitly testing the
non-batch mode performance.

On the flip side it might be interesting to see if we can get batch mode
deletes to batch in the kernel, so we don't have to wait for through
syncrhonize_rcu_expidited for each of them.  Although for the container
case I can just drop the last reference to the network namespace and all
of the network device removals will batch.

Ultimately shrug.  Except in the previous O(N^2) userspace behavior
there don't seem to be any practical performance problems with this many
network devices.  What is interesting is that this many network devices
is becoming interesting on inexpensive COTS servers, for cases that are
not purely network focused.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html