[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250131181049.88262-1-kuniyu@amazon.com>
Date: Fri, 31 Jan 2025 10:10:49 -0800
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <edumazet@...gle.com>
CC: <davem@...emloft.net>, <horms@...nel.org>, <kuba@...nel.org>,
<kuni1840@...il.com>, <kuniyu@...zon.com>, <netdev@...r.kernel.org>,
<pabeni@...hat.com>, <ychemla@...dia.com>
Subject: Re: [PATCH v1 net 1/2] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
From: Eric Dumazet <edumazet@...gle.com>
Date: Fri, 31 Jan 2025 05:33:49 +0100
> On Fri, Jan 31, 2025 at 12:25 AM Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
> >
> > After the cited commit, dev_net(dev) is fetched before holding RTNL
> > and passed to __unregister_netdevice_notifier_net().
> >
> > However, dev_net(dev) might be different after holding RTNL.
> >
> > In the reported case [0], while removing a VF device, its netns was
> > being dismantled and the VF was moved to init_net.
> >
> > So the following sequence is basically illegal when dev was fetched
> > without lookup:
> >
> > net = dev_net(dev);
> > rtnl_net_lock(net);
> >
> > Let's use a new helper rtnl_net_dev_lock() to fix the race.
> >
> > It calls maybe_get_net() for dev_net(dev) and checks dev_net(dev)
> > before/after rtnl_net_lock().
> >
> > The dev_net(dev) pointer itself is valid, thanks to RCU API, but the
>
> I am pretty sure dev_net(net) is not always called under rcu_read_lock().
Ah, exactly !
Why I didn't notice that while writing changelog :S
>
> And RTNL would not help in the future either.
>
> > netns might be being dismantled. maybe_get_net() is to avoid the race.
> > This can be done by holding pernet_ops_rwsem, but it will be overkill.
> >
> > [0]:
>
> > Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
> > Reported-by: Yael Chemla <ychemla@...dia.com>
> > Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> > Tested-by: Yael Chemla <ychemla@...dia.com>
> > ---
> > net/core/dev.c | 59 +++++++++++++++++++++++++++++++++++++++-----------
> > 1 file changed, 46 insertions(+), 13 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index c0021cbd28fc..f91ddb7f8bdf 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -2070,6 +2070,47 @@ static void __move_netdevice_notifier_net(struct net *src_net,
> > __register_netdevice_notifier_net(dst_net, nb, true);
> > }
> >
> > +static bool from_cleanup_net(void)
> > +{
> > +#ifdef CONFIG_NET_NS
> > + return current == cleanup_net_task;
> > +#else
> > + return false;
> > +#endif
> > +}
> > +
> > +static void rtnl_net_dev_lock(struct net_device *dev)
> > +{
> > + struct net *net;
> > +
> > + DEBUG_NET_WARN_ON_ONCE(from_cleanup_net());
> > +again:
> > + /* netns might be being dismantled. */
>
> rcu_read_lock();
>
> > + net = maybe_get_net(dev_net(dev));
>
> rcu_read_unlock();
I'll wait for your first dev_net_rcu() series to be applied and
then repost v2 with rcu_read_lock() and dev_net_rcu() next week.
>
>
>
> > + if (!net) {
> > + cond_resched();
>
> cond_resched() can be a NOP on some kernel builds.
>
> This loop might burn a lot of cpu cycles.
>
> Perhaps msleep(1) instead.
I didn't know that, will use msleep(1).
>
> > + goto again;
> > + }
> > +
>
>
> > + rtnl_net_lock(net);
> > +
> > + /* dev might have been moved to another netns. */
> > + if (!net_eq(net, dev_net(dev))) {
> > + rtnl_net_unlock(net);
> > + put_net(net);
> > + cond_resched();
>
> This cond_resched() seems unnecessary, the net change can not
> occur more than once per rcu grace period (an eternity)
Will remove cond_resched() here.
Thank you!
Powered by blists - more mailing lists