[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJzav0za=tJq7MvXpEXYNFY_+1D6==w2jbKd0-0mhKO4g@mail.gmail.com>
Date: Fri, 31 Jan 2025 05:33:49 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org,
Yael Chemla <ychemla@...dia.com>
Subject: Re: [PATCH v1 net 1/2] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
On Fri, Jan 31, 2025 at 12:25 AM Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
>
> After the cited commit, dev_net(dev) is fetched before holding RTNL
> and passed to __unregister_netdevice_notifier_net().
>
> However, dev_net(dev) might be different after holding RTNL.
>
> In the reported case [0], while removing a VF device, its netns was
> being dismantled and the VF was moved to init_net.
>
> So the following sequence is basically illegal when dev was fetched
> without lookup:
>
> net = dev_net(dev);
> rtnl_net_lock(net);
>
> Let's use a new helper rtnl_net_dev_lock() to fix the race.
>
> It calls maybe_get_net() for dev_net(dev) and checks dev_net(dev)
> before/after rtnl_net_lock().
>
> The dev_net(dev) pointer itself is valid, thanks to RCU API, but the
I am pretty sure dev_net(net) is not always called under rcu_read_lock().
And RTNL would not help in the future either.
> netns might be being dismantled. maybe_get_net() is to avoid the race.
> This can be done by holding pernet_ops_rwsem, but it will be overkill.
>
> [0]:
> Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
> Reported-by: Yael Chemla <ychemla@...dia.com>
> Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> Tested-by: Yael Chemla <ychemla@...dia.com>
> ---
> net/core/dev.c | 59 +++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 46 insertions(+), 13 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index c0021cbd28fc..f91ddb7f8bdf 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2070,6 +2070,47 @@ static void __move_netdevice_notifier_net(struct net *src_net,
> __register_netdevice_notifier_net(dst_net, nb, true);
> }
>
> +static bool from_cleanup_net(void)
> +{
> +#ifdef CONFIG_NET_NS
> + return current == cleanup_net_task;
> +#else
> + return false;
> +#endif
> +}
> +
> +static void rtnl_net_dev_lock(struct net_device *dev)
> +{
> + struct net *net;
> +
> + DEBUG_NET_WARN_ON_ONCE(from_cleanup_net());
> +again:
> + /* netns might be being dismantled. */
rcu_read_lock();
> + net = maybe_get_net(dev_net(dev));
rcu_read_unlock();
> + if (!net) {
> + cond_resched();
cond_resched() can be a NOP on some kernel builds.
This loop might burn a lot of cpu cycles.
Perhaps msleep(1) instead.
> + goto again;
> + }
> +
> + rtnl_net_lock(net);
> +
> + /* dev might have been moved to another netns. */
> + if (!net_eq(net, dev_net(dev))) {
> + rtnl_net_unlock(net);
> + put_net(net);
> + cond_resched();
This cond_resched() seems unnecessary, the net change can not
occur more than once per rcu grace period (an eternity)
> + goto again;
> + }
> +}
Powered by blists - more mailing lists