netdev - Re: [PATCH net-next] rtnetlink: Support fine-grained netdevice bulk deletion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211124061507.09fccc97@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Wed, 24 Nov 2021 06:15:07 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     Lahav Schlesinger <lschlesinger@...venets.com>
Cc:     netdev@...r.kernel.org, dsahern@...il.com
Subject: Re: [PATCH net-next] rtnetlink: Support fine-grained netdevice bulk
 deletion

On Wed, 24 Nov 2021 09:52:55 +0200 Lahav Schlesinger wrote:
> On Tue, Nov 23, 2021 at 08:01:17PM -0800, Jakub Kicinski wrote:
> > CAUTION: External E-Mail - Use caution with links and attachments

Sure.

> > On Tue, 23 Nov 2021 14:39:00 +0200 Lahav Schlesinger wrote:  
> > > Currently there are 2 means of deleting a netdevice using Netlink:
> > > 1. Deleting a single netdevice (either by ifindex using
> > > ifinfomsg::ifi_index, or by name using IFLA_IFNAME)
> > > 2. Delete all netdevice that belong to a group (using IFLA_GROUP)
> > >
> > > After all netdevice are handled, netdev_run_todo() is called, which
> > > calls rcu_barrier() to finish any outstanding RCU callbacks that were
> > > registered during the deletion of the netdevice, then wait until the
> > > refcount of all the devices is 0 and perform final cleanups.
> > >
> > > However, calling rcu_barrier() is a very costly operation, which takes
> > > in the order of ~10ms.
> > >
> > > When deleting a large number of netdevice one-by-one, rcu_barrier()
> > > will be called for each netdevice being deleted, causing the whole
> > > operation taking a long time.
> > >
> > > Following results are from benchmarking deleting 10K loopback devices,
> > > all of which are UP and with only IPv6 LLA being configured:  
> >
> > What's the use case for this?  
> 
> Deletion of 10K loopbacks was just as an example that uses the simplest
> interface type, to show the improvments that can be made in the
> rtnetlink framework, which in turn will have an effect on all interface
> types.
> Though I can see uses of deleting 10k loopbacks by means of doing a
> "factory default" on a large server, such servers can request deleting a
> large bulk of devices at once.

I'm sorry I don't understand. Please provide a clear use case.
I've never heard of "factory default on a large server".

Why can't groups be used?

This optimization requires addition of a uAPI, something we can't
change if it turns out it doesn't fit real uses. I've spent a month
cleaning up after your colleague who decided to put netdev->dev_addr 
on a tree, would be great to understand what your needs are before we
commit more time. 

> > > +     for_each_netdev_safe(net, dev, aux) {
> > > +             for (i = 0; i < size/sizeof(int); ++i) {  
> >
> > Can you not save the references while doing the previous loop?  
> 
> I didn't see any improvements on the timings by saving them (even
> compared to the n^2 loop on this patch), so I didn't want to introduce a
> new list to struct netdevice (using unreg_list seems unfitting here as it
> will collide with ops->dellink() below).

Allocate an array to save the pointers to, no need for lists.

> > Maybe we can allow multiple IFLA_IFINDEX instead?  
> 
> One problem is that it will cut down the number of ifindex that can be
> passed in a single message by half, given that each ifindex will require
> its own struct nlattr.

User space can send multiple messages. That'd be my preference at
least, I wonder what others think.

> Also, I didn't see any quick way of making __nla_validate_parse()
> support saving multiple instances of the same attribute in 'tb' instead
> of overwriting the last one each time, without adding extra overhead.

You can iterate over attributes with nla_for_each_attr(), that's pretty
clean.