[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-Rlpgp3vb-zsgSM@mini-arch>
Date: Wed, 26 Mar 2025 13:37:58 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: Cosmin Ratiu <cratiu@...dia.com>
Cc: "pabeni@...hat.com" <pabeni@...hat.com>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"sdf@...ichev.me" <sdf@...ichev.me>,
"kuba@...nel.org" <kuba@...nel.org>
Subject: Re: [PATCH net-next 2/9] net: hold instance lock during
NETDEV_REGISTER/UP/UNREGISTER
On 03/26, Stanislav Fomichev wrote:
> On 03/26, Cosmin Ratiu wrote:
> > On Wed, 2025-03-26 at 08:23 -0700, Stanislav Fomichev wrote:
> > > @@ -2028,7 +2028,7 @@ int unregister_netdevice_notifier(struct
> > > notifier_block *nb)
> > >
> > > for_each_net(net) {
> > > __rtnl_net_lock(net);
> > > - call_netdevice_unregister_net_notifiers(nb, net,
> > > true);
> > > + call_netdevice_unregister_net_notifiers(nb, net,
> > > NULL);
> > > __rtnl_net_unlock(net);
> > > }
> >
> > I tested. The deadlock is back now, because dev != NULL and if the lock
> > is held (like in the below stack), the mutex_lock will be attempted
> > again:
>
> I think I'm missing something. In this case I'm not sure why the original
> "fix" worked.
>
> You, presumably, use mlx5? And you just move this single device into
> a new netns? Or there is a couple of other mlx5 devices still hanging in
> the root netns?
>
> I'll try to take a look more at register_netdevice_notifier_net under
> mlx5..
I have a feeling that it's a spurious warning, the lock addresses
are different:
ip/1766 is trying to acquire lock:
ffff888110e18c80 (&dev->lock){+.+.}-{4:4}, at:
call_netdevice_unregister_notifiers+0x7d/0x140
but task is already holding lock:
ffff888130ae0c80 (&dev->lock){+.+.}-{4:4}, at:
do_setlink.isra.0+0x5b/0x1220
Can you try to apply the following on top of previous patch? At least
to confirm whether it matches my understanding.. We might also stick
with that unless we find a better option.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 3506024c2453..e3d8d6c9bf03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -40,6 +40,7 @@
#include <linux/if_bridge.h>
#include <linux/filter.h>
#include <net/netdev_queues.h>
+#include <net/netdev_lock.h>
#include <net/page_pool/types.h>
#include <net/pkt_sched.h>
#include <net/xdp_sock_drv.h>
@@ -5454,6 +5455,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
netdev->netdev_ops = &mlx5e_netdev_ops;
netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops;
netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops;
+ netdev_lockdep_set_classes(netdev);
mlx5e_dcbnl_build_netdev(netdev);
Powered by blists - more mailing lists