[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKdg=_uf-gis1knki-XSTbp-oHSXM0=kP-HFm2H39AWcg@mail.gmail.com>
Date: Fri, 7 Feb 2025 07:42:13 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org,
Yael Chemla <ychemla@...dia.com>
Subject: Re: [PATCH v2 net 1/2] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
On Fri, Feb 7, 2025 at 5:43 AM Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
>
> After the cited commit, dev_net(dev) is fetched before holding RTNL
> and passed to __unregister_netdevice_notifier_net().
>
> However, dev_net(dev) might be different after holding RTNL.
>
> In the reported case [0], while removing a VF device, its netns was
> being dismantled and the VF was moved to init_net.
>
> So the following sequence is basically illegal when dev was fetched
> without lookup:
>
> net = dev_net(dev);
> rtnl_net_lock(net);
>
> Let's use a new helper rtnl_net_dev_lock() to fix the race.
>
> It calls maybe_get_net() for dev_net_rcu(dev) and checks dev_net_rcu(dev)
> before/after rtnl_net_lock().
>
> The dev_net_rcu(dev) pointer itself is valid, thanks to RCU API, but the
> netns might be being dismantled. maybe_get_net() is to avoid the race.
> This can be done by holding pernet_ops_rwsem, but it will be overkill.
>
>
> Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
> Reported-by: Yael Chemla <ychemla@...dia.com>
> Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> Tested-by: Yael Chemla <ychemla@...dia.com>
> ---
> v2:
> * Use dev_net_rcu().
> * Use msleep(1) instead of cond_resched() after maybe_get_net()
> * Remove cond_resched() after net_eq() check
>
> v1: https://lore.kernel.org/netdev/20250130232435.43622-2-kuniyu@amazon.com/
> ---
> net/core/dev.c | 63 +++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 50 insertions(+), 13 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b91658e8aedb..f7430c9d9bc3 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2070,6 +2070,51 @@ static void __move_netdevice_notifier_net(struct net *src_net,
> __register_netdevice_notifier_net(dst_net, nb, true);
> }
>
> +static bool from_cleanup_net(void)
> +{
> +#ifdef CONFIG_NET_NS
> + return current == cleanup_net_task;
> +#else
> + return false;
> +#endif
> +}
> +
> +static void rtnl_net_dev_lock(struct net_device *dev)
> +{
> + struct net *net;
> +
> + DEBUG_NET_WARN_ON_ONCE(from_cleanup_net());
I would rather make sure rtnl_net_dev_lock() _can_ be called from cleanup_net()
> +again:
> + /* netns might be being dismantled. */
> + rcu_read_lock();
> + net = maybe_get_net(dev_net_rcu(dev));
I do not think maybe_get_net() is what we want here.
If the netns is already in dismantle phase, the count will be zero.
Instead:
net = dev_net_rcu(dev);
refcount_inc(&net->passive);
> + rcu_read_unlock();
> + if (!net) {
> + msleep(1);
> + goto again;
> + }
> +
> + rtnl_net_lock(net);
> +
> + /* dev might have been moved to another netns. */
> + rcu_read_lock();
As we do not dereference the net pointer, I would not acquire
rcu_read_lock() and instead use
if (!net_eq(net, rcu_access_pointer(dev->nd_net.net)) {
> + if (!net_eq(net, dev_net_rcu(dev))) {
> + rcu_read_unlock();
> + rtnl_net_unlock(net);
> + put_net(net);
instead :
net_drop_ns(net);
> + goto again;
> + }
> + rcu_read_unlock();
> +}
> +
> +static void rtnl_net_dev_unlock(struct net_device *dev)
> +{
> + struct net *net = dev_net(dev);
> +
> + rtnl_net_unlock(net);
And replace the put_net() here and above with:
net_drop_ns(net);
> + put_net(net);
> +}
> +
> int register_netdevice_notifier_dev_net(struct net_device *dev,
> struct notifier_block *nb,
> struct netdev_net_notifier *nn)
> @@ -2077,6 +2122,8 @@ int register_netdevice_notifier_dev_net(struct net_device *dev,
> struct net *net = dev_net(dev);
> int err;
>
> + DEBUG_NET_WARN_ON_ONCE(!list_empty(&dev->dev_list));
/* Why is this needed ? */
> +
> rtnl_net_lock(net);
> err = __register_netdevice_notifier_net(net, nb, false);
> if (!err) {
> @@ -2093,13 +2140,12 @@ int unregister_netdevice_notifier_dev_net(struct net_device *dev,
> struct notifier_block *nb,
> struct netdev_net_notifier *nn)
> {
> - struct net *net = dev_net(dev);
> int err;
>
> - rtnl_net_lock(net);
> + rtnl_net_dev_lock(dev);
> list_del(&nn->list);
> - err = __unregister_netdevice_notifier_net(net, nb);
> - rtnl_net_unlock(net);
> + err = __unregister_netdevice_notifier_net(dev_net(dev), nb);
> + rtnl_net_dev_unlock(dev);
>
> return err;
> }
> @@ -10255,15 +10301,6 @@ static void dev_index_release(struct net *net, int ifindex)
> WARN_ON(xa_erase(&net->dev_by_index, ifindex));
> }
>
> -static bool from_cleanup_net(void)
> -{
> -#ifdef CONFIG_NET_NS
> - return current == cleanup_net_task;
> -#else
> - return false;
> -#endif
> -}
> -
> /* Delayed registration/unregisteration */
> LIST_HEAD(net_todo_list);
> DECLARE_WAIT_QUEUE_HEAD(netdev_unregistering_wq);
> --
> 2.39.5 (Apple Git-154)
>
Powered by blists - more mailing lists