[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a790830-9957-2985-2f5e-e87e6693a723@nvidia.com>
Date: Fri, 8 Jan 2021 11:58:12 +0200
From: Nikolay Aleksandrov <nikolay@...dia.com>
To: Vladimir Oltean <olteanv@...il.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>
CC: <netdev@...r.kernel.org>, Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
Cong Wang <xiyou.wangcong@...il.com>,
Stephen Hemminger <stephen@...workplumber.org>,
Eric Dumazet <edumazet@...gle.com>,
George McCollister <george.mccollister@...il.com>,
Oleksij Rempel <o.rempel@...gutronix.de>,
Jay Vosburgh <j.vosburgh@...il.com>,
Veaceslav Falico <vfalico@...il.com>,
Andy Gospodarek <andy@...yhouse.net>,
Arnd Bergmann <arnd@...db.de>, Taehee Yoo <ap420073@...il.com>,
Jiri Pirko <jiri@...lanox.com>, Florian Westphal <fw@...len.de>
Subject: Re: [PATCH v4 net-next 16/18] net: bonding: ensure .ndo_get_stats64
can sleep
On 08/01/2021 02:20, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@....com>
>
> There is an effort to convert .ndo_get_stats64 to sleepable context, and
> for that to work, we need to prevent callers of dev_get_stats from using
> atomic locking.
>
> The bonding driver retrieves its statistics recursively from its lower
> interfaces, with additional care to only count packets sent/received
> while those lowers were actually enslaved to the bond - see commit
> 5f0c5f73e5ef ("bonding: make global bonding stats more reliable").
>
> Since commit 87163ef9cda7 ("bonding: remove last users of bond->lock and
> bond->lock itself"), the bonding driver uses the following protection
> for its array of slaves: RCU for readers and rtnl_mutex for updaters.
> This is not great because there is another movement [ somehow
> simultaneous with the one to make .ndo_get_stats64 sleepable ] to reduce
> driver usage of rtnl_mutex. This makes sense, because the rtnl_mutex has
> become a very contended resource.
>
> The aforementioned commit removed an interesting comment:
>
> /* [...] we can't hold bond->lock [...] because we'll
> * deadlock. The only solution is to rely on the fact
> * that we're under rtnl_lock here, and the slaves
> * list won't change. This doesn't solve the problem
> * of setting the slave's MTU while it is
> * transmitting, but the assumption is that the base
> * driver can handle that.
> *
> * TODO: figure out a way to safely iterate the slaves
> * list, but without holding a lock around the actual
> * call to the base driver.
> */
>
> The above summarizes pretty well the challenges we have with nested
> bonding interfaces (bond over bond over bond over...), which need to be
> addressed by a better locking scheme that also not relies on the bloated
> rtnl_mutex.
>
> Instead of using something as broad as the rtnl_mutex to ensure
> serialization of updates to the slave array, we can reintroduce a
> private mutex in the bonding driver, called slaves_lock.
> This mutex circles the only updater, bond_update_slave_arr, and ensures
> that whatever other readers want to see a consistent slave array, they
> don't need to hold the rtnl_mutex for that.
>
> Now _of_course_ I did not convert the entire driver to use
> bond_for_each_slave protected by the bond->slaves_lock, and
> rtnl_dereference to bond_dereference. I just started that process by
> converting the one reader I needed: ndo_get_stats64. Not only is it nice
> to not hold rtnl_mutex in .ndo_get_stats64, but it is also in fact
> forbidden to do so (since top-level callers may hold netif_lists_lock,
> which is a sub-lock of the rtnl_mutex, and therefore this would cause a
> lock inversion and a deadlock).
>
> To solve the nesting problem, the simple way is to not hold any locks
> when recursing into the slave netdev operation, which is exactly the
> approach that we take. We can "cheat" and use dev_hold to take a
> reference on the slave net_device, which is enough to ensure that
> netdev_wait_allrefs() waits until we finish, and the kernel won't fault.
> However, the slave structure might no longer be valid, just its
> associated net_device. That isn't a biggie. We just need to do some more
> work to ensure that the slave exists after we took the statistics, and
> if it still does, reapply the logic from Andy's commit 5f0c5f73e5ef.
>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
> Changes in v4:
> Now there is code to propagate errors.
>
> Changes in v3:
> - Added kfree(dev_stats);
> - Removed the now useless stats_lock (bond->bond_stats and
> slave->slave_stats are now protected by bond->slaves_lock)
>
> Changes in v2:
> Switched to the new scheme of holding just a refcnt to the slave
> interfaces while recursing with dev_get_stats.
>
> drivers/net/bonding/bond_main.c | 113 ++++++++++++++------------------
> include/net/bonding.h | 49 +++++++++++++-
> 2 files changed, 99 insertions(+), 63 deletions(-)
>
[snip]
> +static inline int bond_get_slave_arr(struct bonding *bond,
> + struct net_device ***slaves,
> + int *num_slaves)
> +{
> + struct net *net = dev_net(bond->dev);
> + struct list_head *iter;
> + struct slave *slave;
> + int i = 0;
> +
> + mutex_lock(&bond->slaves_lock);
> +
> + *slaves = kcalloc(bond->slave_cnt, sizeof(*slaves), GFP_KERNEL);
> + if (!(*slaves)) {
> + netif_lists_unlock(net);
> + return -ENOMEM;
> + }
The error path looks wrong, you unlock netif_lists and return with slaves_lock held.
Cheers,
Nik
> +
> + bond_for_each_slave(bond, slave, iter) {
> + dev_hold(slave->dev);
> + *slaves[i++] = slave->dev;
> + }
> +
> + *num_slaves = bond->slave_cnt;
> +
> + mutex_unlock(&bond->slaves_lock);
> +
> + return 0;
> +}
> +
> +static inline void bond_put_slave_arr(struct net_device **slaves,
> + int num_slaves)
> +{
> + int i;
> +
> + for (i = 0; i < num_slaves; i++)
> + dev_put(slaves[i]);
> +
> + kfree(slaves);
> +}
> +
> #define BOND_PRI_RESELECT_ALWAYS 0
> #define BOND_PRI_RESELECT_BETTER 1
> #define BOND_PRI_RESELECT_FAILURE 2
>
Powered by blists - more mailing lists