lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250131181049.88262-1-kuniyu@amazon.com>
Date: Fri, 31 Jan 2025 10:10:49 -0800
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <edumazet@...gle.com>
CC: <davem@...emloft.net>, <horms@...nel.org>, <kuba@...nel.org>,
	<kuni1840@...il.com>, <kuniyu@...zon.com>, <netdev@...r.kernel.org>,
	<pabeni@...hat.com>, <ychemla@...dia.com>
Subject: Re: [PATCH v1 net 1/2] net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().

From: Eric Dumazet <edumazet@...gle.com>
Date: Fri, 31 Jan 2025 05:33:49 +0100
> On Fri, Jan 31, 2025 at 12:25 AM Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
> >
> > After the cited commit, dev_net(dev) is fetched before holding RTNL
> > and passed to __unregister_netdevice_notifier_net().
> >
> > However, dev_net(dev) might be different after holding RTNL.
> >
> > In the reported case [0], while removing a VF device, its netns was
> > being dismantled and the VF was moved to init_net.
> >
> > So the following sequence is basically illegal when dev was fetched
> > without lookup:
> >
> >   net = dev_net(dev);
> >   rtnl_net_lock(net);
> >
> > Let's use a new helper rtnl_net_dev_lock() to fix the race.
> >
> > It calls maybe_get_net() for dev_net(dev) and checks dev_net(dev)
> > before/after rtnl_net_lock().
> >
> > The dev_net(dev) pointer itself is valid, thanks to RCU API, but the
> 
> I am pretty sure dev_net(net) is not always called under rcu_read_lock().

Ah, exactly !
Why I didn't notice that while writing changelog :S


> 
> And RTNL would not help in the future either.
> 
> > netns might be being dismantled.  maybe_get_net() is to avoid the race.
> > This can be done by holding pernet_ops_rwsem, but it will be overkill.
> >
> > [0]:
> 
> > Fixes: 7fb1073300a2 ("net: Hold rtnl_net_lock() in (un)?register_netdevice_notifier_dev_net().")
> > Reported-by: Yael Chemla <ychemla@...dia.com>
> > Closes: https://lore.kernel.org/netdev/146eabfe-123c-4970-901e-e961b4c09bc3@nvidia.com/
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> > Tested-by: Yael Chemla <ychemla@...dia.com>
> > ---
> >  net/core/dev.c | 59 +++++++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 46 insertions(+), 13 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index c0021cbd28fc..f91ddb7f8bdf 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -2070,6 +2070,47 @@ static void __move_netdevice_notifier_net(struct net *src_net,
> >         __register_netdevice_notifier_net(dst_net, nb, true);
> >  }
> >
> > +static bool from_cleanup_net(void)
> > +{
> > +#ifdef CONFIG_NET_NS
> > +       return current == cleanup_net_task;
> > +#else
> > +       return false;
> > +#endif
> > +}
> > +
> > +static void rtnl_net_dev_lock(struct net_device *dev)
> > +{
> > +       struct net *net;
> > +
> > +       DEBUG_NET_WARN_ON_ONCE(from_cleanup_net());
> > +again:
> > +       /* netns might be being dismantled. */
> 
> rcu_read_lock();
> 
> > +       net = maybe_get_net(dev_net(dev));
> 
> rcu_read_unlock();

I'll wait for your first dev_net_rcu() series to be applied and
then repost v2 with rcu_read_lock() and dev_net_rcu() next week.


> 
> 
> 
> > +       if (!net) {
> > +               cond_resched();
> 
> cond_resched() can be a NOP on some kernel builds.
> 
> This loop might burn a lot of cpu cycles.
> 
> Perhaps msleep(1) instead.

I didn't know that, will use msleep(1).


> 
> > +               goto again;
> > +       }
> > +
> 
> 
> > +       rtnl_net_lock(net);
> > +
> > +       /* dev might have been moved to another netns. */
> > +       if (!net_eq(net, dev_net(dev))) {
> > +               rtnl_net_unlock(net);
> > +               put_net(net);
> > +               cond_resched();
> 
>       This cond_resched() seems unnecessary, the net change can not
> occur more than once per rcu grace period (an eternity)

Will remove cond_resched() here.

Thank you!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ