[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1304793174.3207.22.camel@edumazet-laptop>
Date: Sat, 07 May 2011 20:32:54 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Alex Bligh <alex@...x.org.uk>
Cc: netdev@...r.kernel.org
Subject: Re: Scalability of interface creation and deletion
Le samedi 07 mai 2011 à 19:24 +0100, Alex Bligh a écrit :
> Eric,
>
> --On 7 May 2011 18:26:29 +0200 Eric Dumazet <eric.dumazet@...il.com> wrote:
>
> > Here, on 2.6.38 kernel (Ubuntu 11.04 provided, on my 2 core laptop)
> ># time rmmod dummy
> > real 0m0.111s
> ...
> > On another machine with a very recent kernel :
> > $ modprobe dummy numdummies=1
> > $ ifconfig dummy0 192.168.46.46 up
> > $ time rmmod dummy
> >
> > real 0m0.032s
>
> I know it's different machines, but that's a pretty significant
> difference. So I compiled from 2.6.39-rc6 head (i.e. a kernel
> less than an hour old), with only your suggested change in,
> so that (a) I could eliminate old kernels, and (b) I could
> instrument it.
>
> > synchronize_rcu() calls are not consuming cpu, they just _wait_
> > rcu grace period.
> >
> > I suggest you read Documentation/RCU files if you really want to :)
>
> I understand the basic point: it needs to wait for all readers
> to drop their references. It's sort of hard to understand why
> on a machine with an idle network there would be reader(s) holding
> references for 250ms. And indeed the analysis below shows that
> isn't the case (it's more like 44 ms).
>
> > If you want to check how expensive it is, its quite easy:
> > add a trace in synchronize_net()
>
> At least for veth devices, I see the same on 2.6.39-rc6 - if anything
> it's worse:
>
> # ./ifseq -n 100
> Sat May 7 17:50:53 UTC 2011 creating 100 interfaces
> Sat May 7 17:50:54 UTC 2011 done
>
> real 0m1.549s
> user 0m0.060s
> sys 0m0.990s
> Sat May 7 17:50:54 UTC 2011 deleting 100 interfaces
> Sat May 7 17:51:22 UTC 2011 done
>
> real 0m27.917s
> user 0m0.420s
> sys 0m0.060s
>
> Performing that operation produced exactly 200 calls to synchronize net.
> The timestamps indicate that's 2 per veth pair deletion, and zero
> per veth pair creation.
>
> Analysing the resultant logs shows only 31% of the problem is
> time spent within synchronize_net() (perl script below).
>
> $ ./analyse.pl < syncnet | tail -2
> Total 18.98515 Usage 199 Average 0.09540 elsewhere
> Total 8.77581 Usage 200 Average 0.04388 synchronizing
>
> So *something* is spending more than twice as much time as
> synchronize_net().
>
> I've attached the log below as well.
>
> --
> Alex Bligh
>
>
> $ cat analyse.pl
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $lastuptime;
> my $uptime;
> my $diff;
> my $area;
> my %time;
> my %usage;
>
> while (<>)
> {
> chomp;
> if (m/\[\s*([0-9.]+)\].*synchronize_net/)
> {
> $uptime = $1;
> if (defined($lastuptime))
> {
> $area = (m/end/)?"synchronizing":"elsewhere";
> $diff = $uptime - $lastuptime;
> printf "%5.5f $area\n", $diff;
> $time{$area}+=$diff;
> $usage{$area}++;
> }
> $lastuptime = $uptime;
> }
> }
>
> print "\n";
>
> my $k;
> foreach $k (sort keys %time)
> {
> printf "Total %5.5f Usage %d Average %5.5f %s\n", $time{$k},
> $usage{$k}, $time{$k}/$usage{$k}, $k;
> }
>
>
>
> May 7 17:50:55 nattytest kernel: [ 127.490142] begin synchronize_net()
> May 7 17:50:55 nattytest kernel: [ 127.560084] end synchronize_net()
> May 7 17:50:55 nattytest kernel: [ 127.610350] begin synchronize_net()
> May 7 17:50:55 nattytest kernel: [ 127.610932] end synchronize_net()
> May 7 17:50:55 nattytest kernel: [ 127.740078] begin synchronize_net()
> May 7 17:50:55 nattytest kernel: [ 127.820071] end synchronize_net()
Well, there is also one rcu_barrier() call that is expensive.
(It was changed from one synchronize_rcu() to one rcu_barrier() lately
in commit ef885afb , in 2.6.36 kernel)
net/core/dev.c line 5167
http://git2.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ef885afbf8a37689afc1d9d545e2f3e7a8276c17
netdev_wait_allrefs() waits that all references to a device vanishes.
It currently uses a _very_ pessimistic 250 ms delay between each probe.
Some users reported that no more than 4 devices can be dismantled per
second, this is a pretty serious problem for some setups.
Most of the time, a refcount is about to be released by an RCU callback,
that is still in flight because rollback_registered_many() uses a
synchronize_rcu() call instead of rcu_barrier(). Problem is visible if
number of online cpus is one, because synchronize_rcu() is then a no op.
time to remove 50 ipip tunnels on a UP machine :
before patch : real 11.910s
after patch : real 1.250s
Reported-by: Nicolas Dichtel <nicolas.dichtel@...nd.com>
Reported-by: Octavian Purdila <opurdila@...acom.com>
Reported-by: Benjamin LaHaise <bcrl@...ck.org>
Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
Signed-off-by: David S. Miller <davem@...emloft.net>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists