lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 Nov 2021 09:52:55 +0200
From:   Lahav Schlesinger <lschlesinger@...venets.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     netdev@...r.kernel.org, dsahern@...il.com
Subject: Re: [PATCH net-next] rtnetlink: Support fine-grained netdevice bulk
 deletion

On Tue, Nov 23, 2021 at 08:01:17PM -0800, Jakub Kicinski wrote:
> CAUTION: External E-Mail - Use caution with links and attachments
>
>
> On Tue, 23 Nov 2021 14:39:00 +0200 Lahav Schlesinger wrote:
> > Currently there are 2 means of deleting a netdevice using Netlink:
> > 1. Deleting a single netdevice (either by ifindex using
> > ifinfomsg::ifi_index, or by name using IFLA_IFNAME)
> > 2. Delete all netdevice that belong to a group (using IFLA_GROUP)
> >
> > After all netdevice are handled, netdev_run_todo() is called, which
> > calls rcu_barrier() to finish any outstanding RCU callbacks that were
> > registered during the deletion of the netdevice, then wait until the
> > refcount of all the devices is 0 and perform final cleanups.
> >
> > However, calling rcu_barrier() is a very costly operation, which takes
> > in the order of ~10ms.
> >
> > When deleting a large number of netdevice one-by-one, rcu_barrier()
> > will be called for each netdevice being deleted, causing the whole
> > operation taking a long time.
> >
> > Following results are from benchmarking deleting 10K loopback devices,
> > all of which are UP and with only IPv6 LLA being configured:
>
> What's the use case for this?

Deletion of 10K loopbacks was just as an example that uses the simplest
interface type, to show the improvments that can be made in the
rtnetlink framework, which in turn will have an effect on all interface
types.
Though I can see uses of deleting 10k loopbacks by means of doing a
"factory default" on a large server, such servers can request deleting a
large bulk of devices at once.

>
> > 1. Deleting one-by-one using 1 thread : 243 seconds
> > 2. Deleting one-by-one using 10 thread: 70 seconds
> > 3. Deleting one-by-one using 50 thread: 54 seconds
> > 4. Deleting all using "group deletion": 30 seconds
> >
> > Note that even though the deletion logic takes place under the rtnl
> > lock, since the call to rcu_barrier() is outside the lock we gain
> > improvements.
> >
> > Since "group deletion" calls rcu_barrier() only once, it is indeed the
> > fastest.
> > However, "group deletion" is too crude as means of deleting large number
> > of devices
> >
> > This patch adds support for passing an arbitrary list of ifindex of
> > netdevices to delete. This gives a more fine-grained control over
> > which devices to delete, while still resulting in only one rcu_barrier()
> > being called.
> > Indeed, the timings of using this new API to delete 10K netdevices is
> > the same as using the existing "group" deletion.
> >
> > The size constraints on the list means the API can delete at most 16382
> > netdevices in a single request.
> >
> > Signed-off-by: Lahav Schlesinger <lschlesinger@...venets.com>
> > ---
> >  include/uapi/linux/if_link.h |  1 +
> >  net/core/rtnetlink.c         | 46 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 47 insertions(+)
> >
> > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> > index eebd3894fe89..f950bf6ed025 100644
> > --- a/include/uapi/linux/if_link.h
> > +++ b/include/uapi/linux/if_link.h
> > @@ -348,6 +348,7 @@ enum {
> >       IFLA_PARENT_DEV_NAME,
> >       IFLA_PARENT_DEV_BUS_NAME,
> >
> > +     IFLA_IFINDEX_LIST,
> >       __IFLA_MAX
> >  };
> >
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > index fd030e02f16d..150587b4b1a4 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -1880,6 +1880,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
> >       [IFLA_PROTO_DOWN_REASON] = { .type = NLA_NESTED },
> >       [IFLA_NEW_IFINDEX]      = NLA_POLICY_MIN(NLA_S32, 1),
> >       [IFLA_PARENT_DEV_NAME]  = { .type = NLA_NUL_STRING },
> > +     [IFLA_IFINDEX_LIST]     = { .type = NLA_BINARY, .len = 65535 },
>
> Can't we leave len unset if we don't have an upper bound?

I thought it will be nicer to have an explicit upper bound instead on
counting on the implicit one from the field type.
I'll remove it in the v2.

>
> >  };
> >
> >  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> > @@ -3050,6 +3051,49 @@ static int rtnl_group_dellink(const struct net *net, int group)
> >       return 0;
> >  }
> >
> > +static int rtnl_list_dellink(struct net *net, void *dev_list, int size)
> > +{
> > +     int i;
> > +     struct net_device *dev, *aux;
> > +     LIST_HEAD(list_kill);
> > +     bool found = false;
> > +
> > +     if (size < 0 || size % sizeof(int))
> > +             return -EINVAL;
> > +
> > +     for_each_netdev(net, dev) {
> > +             for (i = 0; i < size/sizeof(int); ++i) {
>
> __dev_get_by_index() should be much faster than this n^2 loop.

Right, will change in the v2.

>
> > +                     if (dev->ifindex == ((int*)dev_list)[i]) {
>
> please run checkpatch --strict on the submission

Oops, my bad

>
> > +                             const struct rtnl_link_ops *ops;
> > +
> > +                             found = true;
> > +                             ops = dev->rtnl_link_ops;
> > +                             if (!ops || !ops->dellink)
> > +                                     return -EOPNOTSUPP;
> > +                             break;
> > +                     }
> > +             }
> > +     }
> > +
> > +     if (!found)
> > +             return -ENODEV;
>
> Why is it okay to miss some of the ifindexes?

Yeah you're right, will fix it.

>
> > +     for_each_netdev_safe(net, dev, aux) {
> > +             for (i = 0; i < size/sizeof(int); ++i) {
>
> Can you not save the references while doing the previous loop?

I didn't see any improvements on the timings by saving them (even
compared to the n^2 loop on this patch), so I didn't want to introduce a
new list to struct netdevice (using unreg_list seems unfitting here as it
will collide with ops->dellink() below).

>
> > +                     if (dev->ifindex == ((int*)dev_list)[i]) {
> > +                             const struct rtnl_link_ops *ops;
> > +
> > +                             ops = dev->rtnl_link_ops;
> > +                             ops->dellink(dev, &list_kill);
> > +                             break;
> > +                     }
> > +             }
> > +     }
> > +     unregister_netdevice_many(&list_kill);
> > +
> > +     return 0;
> > +}
> > +
> >  int rtnl_delete_link(struct net_device *dev)
> >  {
> >       const struct rtnl_link_ops *ops;
> > @@ -3102,6 +3146,8 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
> >                                  tb[IFLA_ALT_IFNAME], NULL);
> >       else if (tb[IFLA_GROUP])
> >               err = rtnl_group_dellink(tgt_net, nla_get_u32(tb[IFLA_GROUP]));
> > +     else if (tb[IFLA_IFINDEX_LIST])
> > +             err = rtnl_list_dellink(tgt_net, nla_data(tb[IFLA_IFINDEX_LIST]), nla_len(tb[IFLA_IFINDEX_LIST]));
>
> Maybe we can allow multiple IFLA_IFINDEX instead?

One problem is that it will cut down the number of ifindex that can be
passed in a single message by half, given that each ifindex will require
its own struct nlattr.
Also, I didn't see any quick way of making __nla_validate_parse()
support saving multiple instances of the same attribute in 'tb' instead
of overwriting the last one each time, without adding extra overhead.

>
> >       else
> >               goto out;
> >
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ