lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-Rlpgp3vb-zsgSM@mini-arch>
Date: Wed, 26 Mar 2025 13:37:58 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: Cosmin Ratiu <cratiu@...dia.com>
Cc: "pabeni@...hat.com" <pabeni@...hat.com>,
	"edumazet@...gle.com" <edumazet@...gle.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"sdf@...ichev.me" <sdf@...ichev.me>,
	"kuba@...nel.org" <kuba@...nel.org>
Subject: Re: [PATCH net-next 2/9] net: hold instance lock during
 NETDEV_REGISTER/UP/UNREGISTER

On 03/26, Stanislav Fomichev wrote:
> On 03/26, Cosmin Ratiu wrote:
> > On Wed, 2025-03-26 at 08:23 -0700, Stanislav Fomichev wrote:
> > > @@ -2028,7 +2028,7 @@ int unregister_netdevice_notifier(struct
> > > notifier_block *nb)
> > >  
> > >  	for_each_net(net) {
> > >  		__rtnl_net_lock(net);
> > > -		call_netdevice_unregister_net_notifiers(nb, net,
> > > true);
> > > +		call_netdevice_unregister_net_notifiers(nb, net,
> > > NULL);
> > >  		__rtnl_net_unlock(net);
> > >  	}
> > 
> > I tested. The deadlock is back now, because dev != NULL and if the lock
> > is held (like in the below stack), the mutex_lock will be attempted
> > again:
> 
> I think I'm missing something. In this case I'm not sure why the original
> "fix" worked.
> 
> You, presumably, use mlx5? And you just move this single device into
> a new netns? Or there is a couple of other mlx5 devices still hanging in
> the root netns?
> 
> I'll try to take a look more at register_netdevice_notifier_net under
> mlx5..

I have a feeling that it's a spurious warning, the lock addresses
are different:

ip/1766 is trying to acquire lock:
ffff888110e18c80 (&dev->lock){+.+.}-{4:4}, at:
call_netdevice_unregister_notifiers+0x7d/0x140

but task is already holding lock:
ffff888130ae0c80 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.isra.0+0x5b/0x1220

Can you try to apply the following on top of previous patch? At least
to confirm whether it matches my understanding.. We might also stick
with that unless we find a better option.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3506024c2453..e3d8d6c9bf03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -40,6 +40,7 @@
 #include <linux/if_bridge.h>
 #include <linux/filter.h>
 #include <net/netdev_queues.h>
+#include <net/netdev_lock.h>
 #include <net/page_pool/types.h>
 #include <net/pkt_sched.h>
 #include <net/xdp_sock_drv.h>
@@ -5454,6 +5455,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 	netdev->netdev_ops = &mlx5e_netdev_ops;
 	netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops;
 	netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops;
+	netdev_lockdep_set_classes(netdev);
 
 	mlx5e_dcbnl_build_netdev(netdev);
 



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ