netdev - Re: [PATCH v4 net-next 16/18] net: bonding: ensure .ndo_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a790830-9957-2985-2f5e-e87e6693a723@nvidia.com>
Date:   Fri, 8 Jan 2021 11:58:12 +0200
From:   Nikolay Aleksandrov <nikolay@...dia.com>
To:     Vladimir Oltean <olteanv@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
CC:     <netdev@...r.kernel.org>, Andrew Lunn <andrew@...n.ch>,
        Florian Fainelli <f.fainelli@...il.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Stephen Hemminger <stephen@...workplumber.org>,
        Eric Dumazet <edumazet@...gle.com>,
        George McCollister <george.mccollister@...il.com>,
        Oleksij Rempel <o.rempel@...gutronix.de>,
        Jay Vosburgh <j.vosburgh@...il.com>,
        Veaceslav Falico <vfalico@...il.com>,
        Andy Gospodarek <andy@...yhouse.net>,
        Arnd Bergmann <arnd@...db.de>, Taehee Yoo <ap420073@...il.com>,
        Jiri Pirko <jiri@...lanox.com>, Florian Westphal <fw@...len.de>
Subject: Re: [PATCH v4 net-next 16/18] net: bonding: ensure .ndo_get_stats64
 can sleep

On 08/01/2021 02:20, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@....com>
> 
> There is an effort to convert .ndo_get_stats64 to sleepable context, and
> for that to work, we need to prevent callers of dev_get_stats from using
> atomic locking.
> 
> The bonding driver retrieves its statistics recursively from its lower
> interfaces, with additional care to only count packets sent/received
> while those lowers were actually enslaved to the bond - see commit
> 5f0c5f73e5ef ("bonding: make global bonding stats more reliable").
> 
> Since commit 87163ef9cda7 ("bonding: remove last users of bond->lock and
> bond->lock itself"), the bonding driver uses the following protection
> for its array of slaves: RCU for readers and rtnl_mutex for updaters.
> This is not great because there is another movement [ somehow
> simultaneous with the one to make .ndo_get_stats64 sleepable ] to reduce
> driver usage of rtnl_mutex. This makes sense, because the rtnl_mutex has
> become a very contended resource.
> 
> The aforementioned commit removed an interesting comment:
> 
> 	/* [...] we can't hold bond->lock [...] because we'll
> 	 * deadlock. The only solution is to rely on the fact
> 	 * that we're under rtnl_lock here, and the slaves
> 	 * list won't change. This doesn't solve the problem
> 	 * of setting the slave's MTU while it is
> 	 * transmitting, but the assumption is that the base
> 	 * driver can handle that.
> 	 *
> 	 * TODO: figure out a way to safely iterate the slaves
> 	 * list, but without holding a lock around the actual
> 	 * call to the base driver.
> 	 */
> 
> The above summarizes pretty well the challenges we have with nested
> bonding interfaces (bond over bond over bond over...), which need to be
> addressed by a better locking scheme that also not relies on the bloated
> rtnl_mutex.
> 
> Instead of using something as broad as the rtnl_mutex to ensure
> serialization of updates to the slave array, we can reintroduce a
> private mutex in the bonding driver, called slaves_lock.
> This mutex circles the only updater, bond_update_slave_arr, and ensures
> that whatever other readers want to see a consistent slave array, they
> don't need to hold the rtnl_mutex for that.
> 
> Now _of_course_ I did not convert the entire driver to use
> bond_for_each_slave protected by the bond->slaves_lock, and
> rtnl_dereference to bond_dereference. I just started that process by
> converting the one reader I needed: ndo_get_stats64. Not only is it nice
> to not hold rtnl_mutex in .ndo_get_stats64, but it is also in fact
> forbidden to do so (since top-level callers may hold netif_lists_lock,
> which is a sub-lock of the rtnl_mutex, and therefore this would cause a
> lock inversion and a deadlock).
> 
> To solve the nesting problem, the simple way is to not hold any locks
> when recursing into the slave netdev operation, which is exactly the
> approach that we take. We can "cheat" and use dev_hold to take a
> reference on the slave net_device, which is enough to ensure that
> netdev_wait_allrefs() waits until we finish, and the kernel won't fault.
> However, the slave structure might no longer be valid, just its
> associated net_device. That isn't a biggie. We just need to do some more
> work to ensure that the slave exists after we took the statistics, and
> if it still does, reapply the logic from Andy's commit 5f0c5f73e5ef.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---
> Changes in v4:
> Now there is code to propagate errors.
> 
> Changes in v3:
> - Added kfree(dev_stats);
> - Removed the now useless stats_lock (bond->bond_stats and
>   slave->slave_stats are now protected by bond->slaves_lock)
> 
> Changes in v2:
> Switched to the new scheme of holding just a refcnt to the slave
> interfaces while recursing with dev_get_stats.
> 
>  drivers/net/bonding/bond_main.c | 113 ++++++++++++++------------------
>  include/net/bonding.h           |  49 +++++++++++++-
>  2 files changed, 99 insertions(+), 63 deletions(-)
> 
[snip]
> +static inline int bond_get_slave_arr(struct bonding *bond,
> +				     struct net_device ***slaves,
> +				     int *num_slaves)
> +{
> +	struct net *net = dev_net(bond->dev);
> +	struct list_head *iter;
> +	struct slave *slave;
> +	int i = 0;
> +
> +	mutex_lock(&bond->slaves_lock);
> +
> +	*slaves = kcalloc(bond->slave_cnt, sizeof(*slaves), GFP_KERNEL);
> +	if (!(*slaves)) {
> +		netif_lists_unlock(net);
> +		return -ENOMEM;
> +	}

The error path looks wrong, you unlock netif_lists and return with slaves_lock held.

Cheers,
 Nik

> +
> +	bond_for_each_slave(bond, slave, iter) {
> +		dev_hold(slave->dev);
> +		*slaves[i++] = slave->dev;
> +	}
> +
> +	*num_slaves = bond->slave_cnt;
> +
> +	mutex_unlock(&bond->slaves_lock);
> +
> +	return 0;
> +}
> +
> +static inline void bond_put_slave_arr(struct net_device **slaves,
> +				      int num_slaves)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_slaves; i++)
> +		dev_put(slaves[i]);
> +
> +	kfree(slaves);
> +}
> +
>  #define BOND_PRI_RESELECT_ALWAYS	0
>  #define BOND_PRI_RESELECT_BETTER	1
>  #define BOND_PRI_RESELECT_FAILURE	2
>