lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20251025160905.3857885-279-sashal@kernel.org>
Date: Sat, 25 Oct 2025 11:58:30 -0400
From: Sasha Levin <sashal@...nel.org>
To: patches@...ts.linux.dev,
	stable@...r.kernel.org
Cc: Jianbo Liu <jianbol@...dia.com>,
	Cosmin Ratiu <cratiu@...dia.com>,
	Jiri Pirko <jiri@...dia.com>,
	Dragos Tatulea <dtatulea@...dia.com>,
	Tariq Toukan <tariqt@...dia.com>,
	Jakub Kicinski <kuba@...nel.org>,
	Sasha Levin <sashal@...nel.org>,
	saeedm@...dia.com,
	mbloch@...dia.com,
	netdev@...r.kernel.org,
	linux-rdma@...r.kernel.org
Subject: [PATCH AUTOSEL 6.17] net/mlx5e: Prevent entering switchdev mode with inconsistent netns

From: Jianbo Liu <jianbol@...dia.com>

[ Upstream commit 06fdc45f16c392dc3394c67e7c17ae63935715d3 ]

When a PF enters switchdev mode, its netdevice becomes the uplink
representor but remains in its current network namespace. All other
representors (VFs, SFs) are created in the netns of the devlink
instance.

If the PF's netns has been moved and differs from the devlink's netns,
enabling switchdev mode would create a state where the OVS control
plane (ovs-vsctl) cannot manage the switch because the PF uplink
representor and the other representors are split across different
namespaces.

To prevent this inconsistent configuration, block the request to enter
switchdev mode if the PF netdevice's netns does not match the netns of
its devlink instance.

As part of this change, the PF's netns is first marked as immutable.
This prevents race conditions where the netns could be changed after
the check is performed but before the mode transition is complete, and
it aligns the PF's behavior with that of the final uplink representor.

Signed-off-by: Jianbo Liu <jianbol@...dia.com>
Reviewed-by: Cosmin Ratiu <cratiu@...dia.com>
Reviewed-by: Jiri Pirko <jiri@...dia.com>
Reviewed-by: Dragos Tatulea <dtatulea@...dia.com>
Signed-off-by: Tariq Toukan <tariqt@...dia.com>
Link: https://patch.msgid.link/1759094723-843774-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@...nel.org>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---

LLM Generated explanations, may be completely bogus:

YES
**Why Backport**
- Rejects switchdev activation when the PF netdev sits in a different
  netns than the devlink instance, avoiding the broken state where OVS
  loses control of the split representors (`drivers/net/ethernet/mellano
  x/mlx5/core/eswitch_offloads.c:3842-3847`).
- New helper grabs the uplink netdev safely via the existing ref-counted
  accessor and sets `netns_immutable` under RTNL so the PF behaves like
  the eventual uplink representor, while immediately detecting namespace
  divergence (`drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.
  c:3777-3797`;
  `drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h:48-64`).
- If the mode change later fails, the helper rolls the flag back to keep
  legacy behavior untouched; successful transitions keep the flag set,
  matching switchdev guidance to freeze port namespaces (`drivers/net/et
  hernet/mellanox/mlx5/core/eswitch_offloads.c:3867-3869`;
  `Documentation/networking/switchdev.rst:130-143`).
- Locking the namespace leverages the core check that rejects moves of
  immutable interfaces (`net/core/dev.c:12352-12355`), eliminating the
  race window the commit message highlights without touching data-path
  code.
- The change is tightly scoped to the mode-set path, has no dependencies
  on new infrastructure, and fixes a long-standing, user-visible bug
  with minimal regression risk—strong fit for stable kernels that ship
  mlx5 switchdev support.

 .../mellanox/mlx5/core/eswitch_offloads.c     | 33 +++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index f358e8fe432cf..59a1a3a5fc8b5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3739,6 +3739,29 @@ void mlx5_eswitch_unblock_mode(struct mlx5_core_dev *dev)
 	up_write(&esw->mode_lock);
 }
 
+/* Returns false only when uplink netdev exists and its netns is different from
+ * devlink's netns. True for all others so entering switchdev mode is allowed.
+ */
+static bool mlx5_devlink_netdev_netns_immutable_set(struct devlink *devlink,
+						    bool immutable)
+{
+	struct mlx5_core_dev *mdev = devlink_priv(devlink);
+	struct net_device *netdev;
+	bool ret;
+
+	netdev = mlx5_uplink_netdev_get(mdev);
+	if (!netdev)
+		return true;
+
+	rtnl_lock();
+	netdev->netns_immutable = immutable;
+	ret = net_eq(dev_net(netdev), devlink_net(devlink));
+	rtnl_unlock();
+
+	mlx5_uplink_netdev_put(mdev, netdev);
+	return ret;
+}
+
 int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 				  struct netlink_ext_ack *extack)
 {
@@ -3781,6 +3804,14 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	esw->eswitch_operation_in_progress = true;
 	up_write(&esw->mode_lock);
 
+	if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV &&
+	    !mlx5_devlink_netdev_netns_immutable_set(devlink, true)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Can't change E-Switch mode to switchdev when netdev net namespace has diverged from the devlink's.");
+		err = -EINVAL;
+		goto skip;
+	}
+
 	if (mode == DEVLINK_ESWITCH_MODE_LEGACY)
 		esw->dev->priv.flags |= MLX5_PRIV_FLAGS_SWITCH_LEGACY;
 	mlx5_eswitch_disable_locked(esw);
@@ -3799,6 +3830,8 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	}
 
 skip:
+	if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV && err)
+		mlx5_devlink_netdev_netns_immutable_set(devlink, false);
 	down_write(&esw->mode_lock);
 	esw->eswitch_operation_in_progress = false;
 unlock:
-- 
2.51.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ