lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACP96tRSpop=Sf58qE6TXeF+a-5nqfCj5d12bwD9WaYue_6_UA@mail.gmail.com>
Date:	Tue, 27 May 2014 17:29:00 -0400
From:	sowmini varadhan <sowmini05@...il.com>
To:	Jamal Hadi Salim <jhs@...atatu.com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Niels Möller <nisse@...thpole.se>,
	netdev <netdev@...r.kernel.org>, Jonas Bonn <jonas@...thpole.se>
Subject: Scaling 'ip addr add' (was Re: What's the right way to use a *large*
 number of source addresses?)

On Sat, May 24, 2014 at 8:06 AM, Jamal Hadi Salim <jhs@...atatu.com> wrote:
> On 05/23/14 10:14, Eric Dumazet wrote:
>
>> Use the batch mode, and it will be much faster than ifconfig, as
>> ifconfig does not support this mode (you need one fork()/exec() per IP
>> address)
>>
>> ip -batch filename
>>
>
> The address dumping algorithm is a very likely contributor as well.
> It tries to remember indices and then skips on the next iteration
> all the way to where it left off.... has never been a big deal until
> someone tries a substantial number of addresses.
>
> cheers,
> jamal

Niels (nisse@...thpole.se) reported:

   I've done a simple benchmark with a script assigning n addresses
   using "ip address add", and this seems to have O(n^2) complexity.
   E.g, assigning n=25500 addresses took 26 s, and doubling n, assigning
   51000 addresses, took 122 s, 4.6 times longer. Which isn't
   necessarily a problems once all the addresses are assigned, but it
   sounds a bit like there's a linear datastructure in there, not
   intended for a large number of addresses.

And this bothered me, since the suggested workaround of
"ip -b", plus the comment about slow address dumping algorithm
are both saying that there may be some fundamental scaling
issues here.

Also, my earlier comment about netlink vs ioctl was possibly
a red-herring- when I compared my experiment with what Niels is
trying to do, the experiment was different- I was adding
an address to a (newly created) tunnel interface (thus
explodes both number of interfaces and addresses), whereas
Niels is addign all addresses to the same interface.

So I looked at Niels' test script with perf. Some observations:

perf tells me:

   80.13%       ip  [other]
                 |
                 |--30.12%-- fib_sync_up
                 |          |
                 |           --30.12%-- fib_inetaddr_event
                 |                     notifier_call_chain
                 |                     __blocking_notifier_call_chain
                 |                     blocking_notifier_call_chain
                 |                     __inet_insert_ifa
                 |                     inet_rtm_newaddr
                 |                     rtnetlink_rcv_msg
                 |                     netlink_rcv_skb
                 |                     rtnetlink_rcv
                 |                     netlink_unicast
                 |                     netlink_sendmsg
                 |                     sock_sendmsg
                 |                     ___sys_sendmsg
                 |                     __sys_sendmsg
                 |                     SyS_sendmsg
                 |                     SyS_socketcall
                 |                     syscall_call

thus fib_sync_up() itself doesn't scale very well. Not sure
how much tweak-potential exists here.

Further, in __inet_insert_ifa, we walk the ifa_list at least once
(which is probably unavoidable),

static int __inet_insert_ifa( /* .. */
                             u32 portid)
{

        /* ... */
       for (ifap = &in_dev->ifa_list; (ifa1 = *ifap) != NULL;
             ifap = &ifa1->ifa_next) {
        /* ... */
       blocking_notifier_call_chain(&inetaddr_chain, NETDEV_UP, ifa);

       return (0);
}

But in addition, The fib callback: fib_inetaddr_event() has another
potential ifa_list walk for SECONDARY addresses.

        switch (event) {
        case NETDEV_UP:
                fib_add_ifaddr(ifa);
#ifdef CONFIG_IP_ROUTE_MULTIPATH
                fib_sync_up(dev);
#endif

For Niels script, since there are many addresses in the same
subnet, we'll have a lot of cases of an IFA_F_SECONDARY address,
so fib_add_ifaddr will then do another walk of the ifa_list.

Has anyone looked at consolidating some of this?
All of this could easily become a factor when the system
has a large number of interfaces and addresses, and the
control plane only wants to modify a very small subset of
that state.

--Sowmini
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ