[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbead0479ef0b601bada5ae2ad0f8c28e5b242c9.camel@kernel.org>
Date: Mon, 11 Jan 2021 15:38:49 -0800
From: Saeed Mahameed <saeed@...nel.org>
To: Vladimir Oltean <olteanv@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Stephen Hemminger <stephen@...workplumber.org>,
Eric Dumazet <edumazet@...gle.com>,
George McCollister <george.mccollister@...il.com>,
Oleksij Rempel <o.rempel@...gutronix.de>,
Jay Vosburgh <j.vosburgh@...il.com>,
Veaceslav Falico <vfalico@...il.com>,
Andy Gospodarek <andy@...yhouse.net>,
Arnd Bergmann <arnd@...db.de>, Taehee Yoo <ap420073@...il.com>,
Jiri Pirko <jiri@...nulli.us>, Florian Westphal <fw@...len.de>,
Nikolay Aleksandrov <nikolay@...dia.com>,
Pravin B Shelar <pshelar@....org>,
Sridhar Samudrala <sridhar.samudrala@...el.com>
Subject: Re: [PATCH v6 net-next 14/15] net: bonding: ensure .ndo_get_stats64
can sleep
On Sat, 2021-01-09 at 19:26 +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@....com>
>
> There is an effort to convert .ndo_get_stats64 to sleepable context,
> and
> for that to work, we need to prevent callers of dev_get_stats from
> using
> atomic locking.
>
> The bonding driver retrieves its statistics recursively from its
> lower
> interfaces, with additional care to only count packets sent/received
> while those lowers were actually enslaved to the bond - see commit
> 5f0c5f73e5ef ("bonding: make global bonding stats more reliable").
>
> Since commit 87163ef9cda7 ("bonding: remove last users of bond->lock
> and
> bond->lock itself"), the bonding driver uses the following protection
> for its array of slaves: RCU for readers and rtnl_mutex for updaters.
>
> The aforementioned commit removed an interesting comment:
>
> /* [...] we can't hold bond->lock [...] because we'll
> * deadlock. The only solution is to rely on the fact
> * that we're under rtnl_lock here, and the slaves
> * list won't change. This doesn't solve the problem
> * of setting the slave's MTU while it is
> * transmitting, but the assumption is that the base
> * driver can handle that.
> *
> * TODO: figure out a way to safely iterate the slaves
> * list, but without holding a lock around the actual
> * call to the base driver.
> */
>
> The above summarizes pretty well the challenges we have with nested
> bonding interfaces (bond over bond over bond over...) and locking for
> their slaves.
>
> To solve the nesting problem, the simple way is to not hold any locks
> when recursing into the slave netdev operation. We can "cheat" and
> use
> dev_hold to take a reference on the slave net_device, which is enough
> to
> ensure that netdev_wait_allrefs() waits until we finish, and the
> kernel
> won't fault.
>
> However, the slave structure might no longer be valid, just its
> associated net_device. So we need to do some more work to ensure that
> the slave exists after we took the statistics, and if it still does,
> reapply the logic from Andy's commit 5f0c5f73e5ef.
>
> Tested using the following two scripts running in parallel:
>
> #!/bin/bash
>
> while :; do
> ip link del bond0
> ip link del bond1
> ip link add bond0 type bond mode 802.3ad
> ip link add bond1 type bond mode 802.3ad
> ip link set sw0p1 down && ip link set sw0p1 master
> bond0 && ip link set sw0p1 up
> ip link set sw0p2 down && ip link set sw0p2 master
> bond0 && ip link set sw0p2 up
> ip link set sw0p3 down && ip link set sw0p3 master
> bond0 && ip link set sw0p3 up
> ip link set bond0 down && ip link set bond0 master
> bond1 && ip link set bond0 up
> ip link set sw1p1 down && ip link set sw1p1 master
> bond1 && ip link set sw1p1 up
> ip link set bond1 up
> ip -s -s link show
> cat /sys/class/net/bond1/statistics/*
> done
>
> #!/bin/bash
>
> while :; do
> echo spi2.0 > /sys/bus/spi/drivers/sja1105/unbind
> echo spi2.0 > /sys/bus/spi/drivers/sja1105/bind
> sleep 30
> done
>
> where the sja1105 driver was explicitly modified for the purpose of
> this
> test to have a msleep(500) in its .ndo_get_stats64 method, to catch
> some
> more potential races.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
>
[...]
>
> +/* Helpers for reference counting the struct net_device behind the
> bond slaves.
> + * These can be used to propagate the net_device_ops from the bond
> to the
> + * slaves while not holding rcu_read_lock() or the rtnl_mutex.
> + */
> +struct bonding_slave_dev {
> + struct net_device *ndev;
> + struct list_head list;
> +};
> +
> +static inline void bond_put_slaves(struct list_head *slaves)
> +{
> + struct bonding_slave_dev *s, *tmp;
> +
> + list_for_each_entry_safe(s, tmp, slaves, list) {
> + dev_put(s->ndev);
> + list_del(&s->list);
> + kfree(s);
> + }
> +}
> +
> +static inline int bond_get_slaves(struct bonding *bond,
> + struct list_head *slaves,
> + int *num_slaves)
> +{
> + struct list_head *iter;
> + struct slave *slave;
> +
> + INIT_LIST_HEAD(slaves);
> + *num_slaves = 0;
> +
> + rcu_read_lock();
> +
> + bond_for_each_slave_rcu(bond, slave, iter) {
> + struct bonding_slave_dev *s;
> +
> + s = kzalloc(sizeof(*s), GFP_ATOMIC);
GFP_ATOMIC is a little bit aggressive especially when user daemons are
periodically reading stats. This can be avoided.
You can pre-allocate with GFP_KERNEL an array with an "approximate"
size.
then fill the array up with whatever slaves the the bond has at that
moment, num_of_slaves can be less, equal or more than the array you
just allocated but we shouldn't care ..
something like:
rcu_read_lock()
nslaves = bond_get_num_slaves();
rcu_read_unlock()
sarray = kcalloc(nslaves, sizeof(struct bonding_slave_dev),
GFP_KERNEL);
rcu_read_lock();
bond_fill_slaves_array(bond, sarray); // also do: dev_hold()
rcu_read_unlock();
bond_get_slaves_array_stats(sarray);
bond_put_slaves_array(sarray);
> + if (!s) {
> + rcu_read_unlock();
> + bond_put_slaves(slaves);
> + return -ENOMEM;
> + }
> +
> + s->ndev = slave->dev;
> + dev_hold(s->ndev);
> + list_add_tail(&s->list, slaves);
> + (*num_slaves)++;
> + }
> +
> + rcu_read_unlock();
> +
> + return 0;
> +}
> +
> #define BOND_PRI_RESELECT_ALWAYS 0
> #define BOND_PRI_RESELECT_BETTER 1
> #define BOND_PRI_RESELECT_FAILURE 2
Powered by blists - more mailing lists