lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241107183527.676877-2-tariqt@nvidia.com>
Date: Thu, 7 Nov 2024 20:35:21 +0200
From: Tariq Toukan <tariqt@...dia.com>
To: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>, "Andrew
 Lunn" <andrew+netdev@...n.ch>
CC: <netdev@...r.kernel.org>, Saeed Mahameed <saeedm@...dia.com>, Gal Pressman
	<gal@...dia.com>, Leon Romanovsky <leonro@...dia.com>, Chiara Meiohas
	<cmeiohas@...dia.com>, Mark Bloch <mbloch@...dia.com>, Tariq Toukan
	<tariqt@...dia.com>
Subject: [PATCH net 1/7] net/mlx5: E-switch, unload IB representors when unloading ETH representors

From: Chiara Meiohas <cmeiohas@...dia.com>

IB representors depend on ETH representors, so the IB representors
should not exist without the ETH ones. When unloading the ETH
representors, the corresponding IB representors should be also
unloaded.

The commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions")
introduced the use of the ib_device_set_netdev API in IB
repsresentors. ib_device_set_netdev() increments the refcount of
the representor's netdev when loading an IB representor and
decrements it when unloading.
Without the unloading of the IB representor, the refcount of the
representor's netdev remains greater than 0, preventing it from
being unregistered.
The patch uncovered an underlying bug where the eth representor is
unloaded, without unloading the IB representor.

This issue happened when using multiport E-switch and rebooting,
causing the shutdown to hang when unloading the ETH representor
because the refcount of the representor's netdevice was greater than 0.

Call trace:
unregister_netdevice: waiting for eth3 to become free. Usage count = 2
ref_tracker: eth%d@...00000661d60f7 has 1/1 users at
	ib_device_set_netdev+0x160/0x2d0 [ib_core]
	mlx5_ib_vport_rep_load+0x104/0x3f0 [mlx5_ib]
	mlx5_eswitch_reload_ib_reps+0xfc/0x110 [mlx5_core]
	mlx5_mpesw_work+0x236/0x330 [mlx5_core]
	process_one_work+0x169/0x320
	worker_thread+0x288/0x3a0
	kthread+0xb8/0xe0
	ret_from_fork+0x2d/0x50
	ret_from_fork_asm+0x11/0x20

Fixes: 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions")
Signed-off-by: Chiara Meiohas <cmeiohas@...dia.com>
Reviewed-by: Mark Bloch <mbloch@...dia.com>
Signed-off-by: Tariq Toukan <tariqt@...dia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index f24f91d213f2..8cf61ae8b89d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2527,8 +2527,11 @@ static void __esw_offloads_unload_rep(struct mlx5_eswitch *esw,
 				      struct mlx5_eswitch_rep *rep, u8 rep_type)
 {
 	if (atomic_cmpxchg(&rep->rep_data[rep_type].state,
-			   REP_LOADED, REP_REGISTERED) == REP_LOADED)
+			   REP_LOADED, REP_REGISTERED) == REP_LOADED) {
+		if (rep_type == REP_ETH)
+			__esw_offloads_unload_rep(esw, rep, REP_IB);
 		esw->offloads.rep_ops[rep_type]->unload(rep);
+	}
 }
 
 static void __unload_reps_all_vport(struct mlx5_eswitch *esw, u8 rep_type)
-- 
2.44.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ