lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0F4A638C2A523577CDBC295E@Ximines.local>
Date:	Sat, 07 May 2011 16:26:39 +0100
From:	Alex Bligh <alex@...x.org.uk>
To:	Eric Dumazet <eric.dumazet@...il.com>
cc:	netdev@...r.kernel.org, Alex Bligh <alex@...x.org.uk>
Subject: Re: Scalability of interface creation and deletion

Eric,

>> 1. Interface creation slows down hugely with more interfaces
>
> sysfs is the problem, a very well known one.
> (sysfs_refresh_inode(),

Thanks

>> 2. Interface deletion is normally much slower than interface creation
>>
>> strace -T -ttt on the "ip" command used to do this does not show the
>> delay where I thought it would be - cataloguing the existing interfaces.
>> Instead, it's the final send() to the netlink socket which does the
>> relevant action which appears to be slow, for both addition and detion.
>> Adding the last interface takes 200ms in that syscall, the first is
>> quick (symptomatic of a slowdown); for deletion the last send syscall is
>> quick.
>
>> I am having difficulty seeing what might be the issue in interface
>> creation. Any ideas?
>>
>
> Actually a lot, just make
>
> git log net/core/dev.c
>
> and you'll see many commits to make this faster.

OK. I am up to 2.6.38.2 and see no improvement by then. I will
try something bleeding edge in a bit.

>> I am guessing that this is going to do the msleep 50% of the time,
>> explaining 125ms of the observed time. How would people react to
>> exponential backoff instead (untested):
>>
>> 	int backoff = 10;
>>         refcnt = netdev_refcnt_read(dev);
>>
>>         while (refcnt != 0) {
>>                 ...
>>                 msleep(backoff);
>>                 if ((backoff *= 2) > 250)
>>                   backoff = 250;
>> 		
>>                 refcnt = netdev_refcnt_read(dev);
>> 		....
>>         }
>>
>>
>
> Welcome to the club. This is what is discussed on netdev since many
> years. Lot of work had been done to make it better.

Well, I patched it (patch attached for what it's worth) and it made
no difference in this case. I would suggest however that it might
be the right think to do anyway.

> Interface deletion needs several rcu synch calls, they are very
> expensive. This is the price to pay to have lockless network stack in
> fast paths.

On the current 8 core box I am testing, I see 280ms per interface
delete **even with only 10 interfaces**. I see 260ms with one
interface. I know doing lots of rcu sync stuff can be slow, but
260ms to remove one veth pair sounds like more than rcu sync going
on. It sounds like a sleep (though I may not have found the
right one). I see no CPU load.

Equally, with one interface (remember I'm doing this in unshare -n
so there is only a loopback interface there), this bit surely
can't be sysfs.

-- 
Alex Bligh

Signed-off-by: Alex Bligh <alex@...x.org.uk>
diff --git a/net/core/dev.c b/net/core/dev.c
index 6561021..f55c95c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5429,6 +5429,7 @@ static void netdev_wait_allrefs(struct net_device 
*dev)
 {
        unsigned long rebroadcast_time, warning_time;
        int refcnt;
+       int backoff = 5;

        linkwatch_forget_dev(dev);

@@ -5460,7 +5461,9 @@ static void netdev_wait_allrefs(struct net_device 
*dev)
                        rebroadcast_time = jiffies;
                }

-               msleep(250);
+               msleep(backoff);
+               if ((backoff *= 2) > 250)
+                 backoff = 250;

                refcnt = netdev_refcnt_read(dev);




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ