lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251121151644.1797728-3-cratiu@nvidia.com>
Date: Fri, 21 Nov 2025 17:16:44 +0200
From: Cosmin Ratiu <cratiu@...dia.com>
To: <netdev@...r.kernel.org>
CC: Jay Vosburgh <jv@...sburgh.net>, Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, "Simon
 Horman" <horms@...nel.org>, Taehee Yoo <ap420073@...il.com>, Jianbo Liu
	<jianbol@...dia.com>, Sabrina Dubroca <sd@...asysnail.net>, Steffen Klassert
	<steffen.klassert@...unet.com>, Herbert Xu <herbert@...dor.apana.org.au>,
	Leon Romanovsky <leonro@...dia.com>, Cosmin Ratiu <cratiu@...dia.com>
Subject: [RFC PATCH ipsec 2/2] bonding: Maintain offloaded xfrm on all devices

The bonding driver manages offloaded SAs using the following strategy:

An xfrm_state offloaded on the bond device with bond_ipsec_add_sa() uses
'real_dev' on the xfrm_state xs to redirect the offload to the current
active slave. The corresponding bond_ipsec_del_sa() (called with the xs
spinlock held) redirects the unoffload call to real_dev. Finally,
cleanup happens in bond_ipsec_free_sa(), which removes the offload from
the device. Since the last call happens without the xs spinlock held,
that is where the real work to unoffload actually happens.

When the active slave changes to a new device a 3-step process is used
to migrate all xfrm states to the new device:
1. bond_ipsec_del_sa_all() unoffloads all states in bond->ipsec_list
   from the previously active device.
2. The active slave is flipped to the new device.
3. bond_ipsec_add_sa_all() offloads all states in bond->ipsec_list to
   the new device.

There can be two races which result in unencrypted IPSec packets being
transmitted on the wire:

1. Unencrypted IPSec packet on old_dev:
CPU1 (xfrm_output)                   CPU2 (bond_change_active_slave)
bond_ipsec_offload_ok -> true
                                     bond_ipsec_del_sa_all
bond_xmit_activebackup
bond_dev_queue_xmit
dev_queue_xmit on old_dev
				     bond->curr_active_slave = new_dev
				     bond_ipsec_add_sa_all

2. Unencrypted IPSec packet on new_dev:
CPU1 (xfrm_output)                   CPU2 (bond_change_active_slave)
bond_ipsec_offload_ok -> true
                                     bond->curr_active_slave = new_dev
                                     bond_ipsec_migrate_sa_all
bond_xmit_activebackup
bond_dev_queue_xmit
dev_queue_xmit on new_dev
				     bond_ipsec_migrate_sa_all finishes

This patch fixes both these issues. Bonding now maintain SAs on all
devices by making use of the previous patch that allows the same xfrm
state to be offloaded on multiple devices. This consists of:

1. Maintaining two linked lists:
- bond->ipsec_list is the list of xfrm states offloaded to the bonding
  device.
- Each slave has its own bond->ipsec_offloads list holding offloads of
  bond->ipsec_list on that slave.
These lists are protected by the existing bond->ipsec_lock mutex.

2. When a slave is added (bond_enslave), bond_ipsec_add_sa_all now
   offloads all xfrm states to the new device.

3. When a slave is removed (__bond_release_one), bond_ipsec_del_sa_all
   now removes all xfrm state offloads from that device.

4. When the active slave is changed (bond_change_active_slave), a new
   bond_ipsec_migrate_sa_all function switches xs->xso.real_dev and
   xs->xso.offload handle for all offloaded xfrm states.
   xdo_dev_state_advance_esn is also called on the new device to update
   the esn state.

5. Adding an offloaded xfrm state to the bond device must now iterate
   through active slaves. To make that nice, RTNL is grabbed there. The
   alternative is repeatedly grabbing each slave under the RCU lock,
   holding it, releasing the lock to be able to offload a state, then
   re-grabbing the RCU lock and releasing the slave. RTNL seems cleaner.

6. bond_ipsec_del_sa (.xdo_dev_state_delete for bond) is unchanged, it
   now only deletes the state from the active device and leaves the rest
   for the xdo_dev_state_free callback, which can grab the required
   locks.

Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Cosmin Ratiu <cratiu@...dia.com>
---
 drivers/net/bonding/bond_main.c | 283 +++++++++++++++++---------------
 include/net/bonding.h           |  22 ++-
 2 files changed, 164 insertions(+), 141 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4c5b73786877..979e5aabf8d2 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -452,6 +452,61 @@ static struct net_device *bond_ipsec_dev(struct xfrm_state *xs)
 	return slave->dev;
 }
 
+static struct bond_ipsec_offload*
+bond_ipsec_dev_add_sa(struct net_device *dev, struct bond_ipsec *ipsec,
+		      struct netlink_ext_ack *extack)
+{
+	struct bond_ipsec_offload *offload;
+	int err;
+
+	if (!dev->xfrmdev_ops ||
+	    !dev->xfrmdev_ops->xdo_dev_state_add ||
+	    netif_is_bond_master(dev)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Slave does not support ipsec offload");
+		return ERR_PTR(-EINVAL);
+	}
+
+	offload = kzalloc(sizeof(*offload), GFP_KERNEL);
+	if (!offload)
+		return ERR_PTR(-ENOMEM);
+
+	offload->ipsec = ipsec;
+	offload->dev = dev;
+	err = dev->xfrmdev_ops->xdo_dev_state_add(dev, ipsec->xs,
+						   &offload->handle, extack);
+	if (err)
+		return ERR_PTR(err);
+	return offload;
+}
+
+static void bond_ipsec_dev_del_sa(struct bond_ipsec_offload *offload)
+{
+	struct xfrm_state *xs = offload->ipsec->xs;
+	struct net_device *dev = offload->dev;
+
+	if (dev->xfrmdev_ops->xdo_dev_state_delete) {
+		spin_lock_bh(&xs->lock);
+		/* Don't double delete states killed by the user
+		 * from xs->xso.real_dev.
+		 */
+		if (dev != xs->xso.real_dev ||
+		    xs->km.state != XFRM_STATE_DEAD)
+			dev->xfrmdev_ops->xdo_dev_state_delete(dev, xs,
+							       offload->handle);
+		if (xs->xso.real_dev == dev)
+			xs->xso.real_dev = NULL;
+		spin_unlock_bh(&xs->lock);
+	}
+
+	if (dev->xfrmdev_ops->xdo_dev_state_free)
+		dev->xfrmdev_ops->xdo_dev_state_free(dev, xs, offload->handle);
+
+	list_del(&offload->list);
+	list_del(&offload->ipsec_list);
+	kfree(offload);
+}
+
 /**
  * bond_ipsec_add_sa - program device with a security association
  * @bond_dev: pointer to the bond net device
@@ -464,111 +519,83 @@ static int bond_ipsec_add_sa(struct net_device *bond_dev,
 			     unsigned long *offload_handle,
 			     struct netlink_ext_ack *extack)
 {
-	struct net_device *real_dev;
-	netdevice_tracker tracker;
+	struct bond_ipsec_offload *offload, *tmp;
+	struct slave *slave, *curr_active;
 	struct bond_ipsec *ipsec;
+	struct list_head *iter;
 	struct bonding *bond;
-	struct slave *slave;
 	int err;
 
 	if (!bond_dev)
 		return -EINVAL;
 
-	rcu_read_lock();
 	bond = netdev_priv(bond_dev);
-	slave = rcu_dereference(bond->curr_active_slave);
-	real_dev = slave ? slave->dev : NULL;
-	netdev_hold(real_dev, &tracker, GFP_ATOMIC);
-	rcu_read_unlock();
-	if (!real_dev) {
-		err = -ENODEV;
-		goto out;
-	}
-
-	if (!real_dev->xfrmdev_ops ||
-	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
-	    netif_is_bond_master(real_dev)) {
-		NL_SET_ERR_MSG_MOD(extack, "Slave does not support ipsec offload");
-		err = -EINVAL;
-		goto out;
-	}
-
 	ipsec = kmalloc(sizeof(*ipsec), GFP_KERNEL);
 	if (!ipsec) {
 		err = -ENOMEM;
 		goto out;
 	}
+	ipsec->xs = xs;
+	INIT_LIST_HEAD(&ipsec->offloads);
 
-	err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs,
-						       offload_handle, extack);
-	if (!err) {
-		xs->xso.real_dev = real_dev;
-		ipsec->xs = xs;
-		INIT_LIST_HEAD(&ipsec->list);
-		mutex_lock(&bond->ipsec_lock);
-		list_add(&ipsec->list, &bond->ipsec_list);
-		mutex_unlock(&bond->ipsec_lock);
-	} else {
-		kfree(ipsec);
+	rtnl_lock();
+	mutex_lock(&bond->ipsec_lock);
+	curr_active = rtnl_dereference(bond->curr_active_slave);
+	bond_for_each_slave(bond, slave, iter) {
+		offload = bond_ipsec_dev_add_sa(slave->dev, ipsec, extack);
+		if (IS_ERR(offload)) {
+			err = PTR_ERR(offload);
+			goto err_slave_dev;
+		}
+		list_add_tail(&offload->list, &slave->ipsec_offloads);
+		list_add_tail(&offload->ipsec_list, &ipsec->offloads);
+		if (curr_active == slave)
+			*offload_handle = offload->handle;
 	}
+
+	/* Mark the xs as 'active' on the current active device. */
+	if (curr_active)
+		xs->xso.real_dev = curr_active->dev;
+	list_add_tail(&ipsec->list, &bond->ipsec_list);
+	mutex_unlock(&bond->ipsec_lock);
+	rtnl_unlock();
+
+	return 0;
+
+err_slave_dev:
+	list_for_each_entry_safe(offload, tmp, &ipsec->offloads, ipsec_list)
+		bond_ipsec_dev_del_sa(offload);
+	kfree(ipsec);
+	mutex_unlock(&bond->ipsec_lock);
+	rtnl_unlock();
 out:
-	netdev_put(real_dev, &tracker);
 	return err;
 }
 
-static void bond_ipsec_add_sa_all(struct bonding *bond)
+static void bond_ipsec_add_sa_all(struct bonding *bond, struct slave *new_slave,
+				  struct netlink_ext_ack *extack)
 {
+	struct net_device *real_dev = new_slave->dev;
 	struct net_device *bond_dev = bond->dev;
-	struct net_device *real_dev;
 	struct bond_ipsec *ipsec;
 	struct slave *slave;
 
-	slave = rtnl_dereference(bond->curr_active_slave);
-	real_dev = slave ? slave->dev : NULL;
-	if (!real_dev)
-		return;
+	INIT_LIST_HEAD(&new_slave->ipsec_offloads);
 
 	mutex_lock(&bond->ipsec_lock);
-	if (!real_dev->xfrmdev_ops ||
-	    !real_dev->xfrmdev_ops->xdo_dev_state_add ||
-	    netif_is_bond_master(real_dev)) {
-		if (!list_empty(&bond->ipsec_list))
-			slave_warn(bond_dev, real_dev,
-				   "%s: no slave xdo_dev_state_add\n",
-				   __func__);
-		goto out;
-	}
-
 	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+		struct bond_ipsec_offload *offload;
 		struct xfrm_state *xs = ipsec->xs;
 
-		/* If new state is added before ipsec_lock acquired */
-		if (xs->xso.real_dev == real_dev)
-			continue;
-
-		if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs,
-							     &xs->xso.offload_handle,
-							     NULL)) {
+		offload = bond_ipsec_dev_add_sa(slave->dev, ipsec, extack);
+		if (IS_ERR(offload)) {
 			slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
 			continue;
 		}
 
-		spin_lock_bh(&xs->lock);
-		/* xs might have been killed by the user during the migration
-		 * to the new dev, but bond_ipsec_del_sa() should have done
-		 * nothing, as xso.real_dev is NULL.
-		 * Delete it from the device we just added it to. The pending
-		 * bond_ipsec_free_sa() call will do the rest of the cleanup.
-		 */
-		if (xs->km.state == XFRM_STATE_DEAD &&
-		    real_dev->xfrmdev_ops->xdo_dev_state_delete)
-			real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
-								    xs,
-								    xs->xso.offload_handle);
-		xs->xso.real_dev = real_dev;
-		spin_unlock_bh(&xs->lock);
+		list_add_tail(&offload->list, &new_slave->ipsec_offloads);
+		list_add_tail(&offload->ipsec_list, &ipsec->offloads);
 	}
-out:
 	mutex_unlock(&bond->ipsec_lock);
 }
 
@@ -600,47 +627,13 @@ static void bond_ipsec_del_sa(struct net_device *bond_dev,
 						    offload_handle);
 }
 
-static void bond_ipsec_del_sa_all(struct bonding *bond)
+static void bond_ipsec_del_sa_all(struct bonding *bond, struct slave *old_slave)
 {
-	struct net_device *bond_dev = bond->dev;
-	struct net_device *real_dev;
-	struct bond_ipsec *ipsec;
-	struct slave *slave;
-
-	slave = rtnl_dereference(bond->curr_active_slave);
-	real_dev = slave ? slave->dev : NULL;
-	if (!real_dev)
-		return;
+	struct bond_ipsec_offload *offload, *tmp;
 
 	mutex_lock(&bond->ipsec_lock);
-	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
-		struct xfrm_state *xs = ipsec->xs;
-
-		if (!xs->xso.real_dev)
-			continue;
-
-		if (!real_dev->xfrmdev_ops ||
-		    !real_dev->xfrmdev_ops->xdo_dev_state_delete ||
-		    netif_is_bond_master(real_dev)) {
-			slave_warn(bond_dev, real_dev,
-				   "%s: no slave xdo_dev_state_delete\n",
-				   __func__);
-			continue;
-		}
-
-		spin_lock_bh(&xs->lock);
-		xs->xso.real_dev = NULL;
-		/* Don't double delete states killed by the user. */
-		if (xs->km.state != XFRM_STATE_DEAD)
-			real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
-								    xs,
-								    xs->xso.offload_handle);
-		spin_unlock_bh(&xs->lock);
-
-		if (real_dev->xfrmdev_ops->xdo_dev_state_free)
-			real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs,
-								  xs->xso.offload_handle);
-	}
+	list_for_each_entry_safe(offload, tmp, &old_slave->ipsec_offloads, list)
+		bond_ipsec_dev_del_sa(offload);
 	mutex_unlock(&bond->ipsec_lock);
 }
 
@@ -648,33 +641,45 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev,
 			       struct xfrm_state *xs,
 			       unsigned long offload_handle)
 {
-	struct net_device *real_dev;
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct bond_ipsec_offload *offload, *tmp;
 	struct bond_ipsec *ipsec;
-	struct bonding *bond;
 
-	if (!bond_dev)
-		return;
+	mutex_lock(&bond->ipsec_lock);
+	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
+		if (ipsec->xs != xs)
+			continue;
 
-	bond = netdev_priv(bond_dev);
+		list_for_each_entry_safe(offload, tmp, &ipsec->offloads,
+					 ipsec_list)
+			bond_ipsec_dev_del_sa(offload);
+
+		list_del(&ipsec->list);
+		kfree(ipsec);
+		break;
+	}
+	mutex_unlock(&bond->ipsec_lock);
+}
+
+static void bond_ipsec_migrate_sa_all(struct bonding *bond,
+				      struct slave *new_active)
+{
+	struct bond_ipsec_offload *offload;
 
 	mutex_lock(&bond->ipsec_lock);
-	if (!xs->xso.real_dev)
-		goto out;
+	list_for_each_entry(offload, &new_active->ipsec_offloads, list) {
+		struct xfrm_state *xs = offload->ipsec->xs;
 
-	real_dev = xs->xso.real_dev;
+		spin_lock_bh(&xs->lock);
+		if (xs->km.state != XFRM_STATE_DEAD) {
+			struct net_device *dev = new_active->dev;
 
-	xs->xso.real_dev = NULL;
-	if (real_dev->xfrmdev_ops &&
-	    real_dev->xfrmdev_ops->xdo_dev_state_free)
-		real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs,
-							  offload_handle);
-out:
-	list_for_each_entry(ipsec, &bond->ipsec_list, list) {
-		if (ipsec->xs == xs) {
-			list_del(&ipsec->list);
-			kfree(ipsec);
-			break;
+			xs->xso.real_dev = dev;
+			xs->xso.offload_handle = offload->handle;
+			if (dev->xfrmdev_ops->xdo_dev_state_advance_esn)
+				dev->xfrmdev_ops->xdo_dev_state_advance_esn(xs);
 		}
+		spin_unlock_bh(&xs->lock);
 	}
 	mutex_unlock(&bond->ipsec_lock);
 }
@@ -1236,10 +1241,6 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 	if (old_active == new_active)
 		return;
 
-#ifdef CONFIG_XFRM_OFFLOAD
-	bond_ipsec_del_sa_all(bond);
-#endif /* CONFIG_XFRM_OFFLOAD */
-
 	if (new_active) {
 		new_active->last_link_up = jiffies;
 
@@ -1267,6 +1268,11 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 	if (bond_uses_primary(bond))
 		bond_hw_addr_swap(bond, new_active, old_active);
 
+#ifdef CONFIG_XFRM_OFFLOAD
+	if (new_active)
+		bond_ipsec_migrate_sa_all(bond, new_active);
+#endif /* CONFIG_XFRM_OFFLOAD */
+
 	if (bond_is_lb(bond)) {
 		bond_alb_handle_active_change(bond, new_active);
 		if (old_active)
@@ -1311,10 +1317,6 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 		}
 	}
 
-#ifdef CONFIG_XFRM_OFFLOAD
-	bond_ipsec_add_sa_all(bond);
-#endif /* CONFIG_XFRM_OFFLOAD */
-
 	/* resend IGMP joins since active slave has changed or
 	 * all were sent on curr_active_slave.
 	 * resend only if bond is brought up with the affected
@@ -2241,6 +2243,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 		goto err_detach;
 	}
 
+#ifdef CONFIG_XFRM_OFFLOAD
+	bond_ipsec_add_sa_all(bond, new_slave, extack);
+#endif
+
 	res = bond_master_upper_dev_link(bond, new_slave, extack);
 	if (res) {
 		slave_dbg(bond_dev, slave_dev, "Error %d calling bond_master_upper_dev_link\n", res);
@@ -2530,6 +2536,10 @@ static int __bond_release_one(struct net_device *bond_dev,
 		bond_select_active_slave(bond);
 	}
 
+#ifdef CONFIG_XFRM_OFFLOAD
+	bond_ipsec_del_sa_all(bond, slave);
+#endif
+
 	bond_set_carrier(bond);
 	if (!bond_has_slaves(bond))
 		eth_hw_addr_random(bond_dev);
@@ -3931,6 +3941,7 @@ static int bond_master_netdev_event(unsigned long event,
 #ifdef CONFIG_XFRM_OFFLOAD
 		xfrm_dev_state_flush(dev_net(bond_dev), bond_dev, true);
 #endif /* CONFIG_XFRM_OFFLOAD */
+
 		break;
 	case NETDEV_REGISTER:
 		bond_create_proc_entry(event_bond);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 49edc7da0586..deb8904adb25 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -156,6 +156,20 @@ struct bond_params {
 	u8 ad_actor_system[ETH_ALEN + 2];
 };
 
+struct bond_ipsec {
+	struct xfrm_state *xs;
+	struct list_head list;  /* Entry in bond_dev->ipsec_list. */
+	struct list_head offloads;  /* Offloads of xs on slave devs. */
+};
+
+struct bond_ipsec_offload {
+	struct bond_ipsec *ipsec;
+	struct list_head list;  /* Entry in slave->ipsec_offloads. */
+	struct list_head ipsec_list;  /* Entry in parent bond_ipsec. */
+	struct net_device *dev;
+	unsigned long handle;
+};
+
 struct slave {
 	struct net_device *dev; /* first - useful for panic debug */
 	struct bonding *bond; /* our master */
@@ -188,6 +202,9 @@ struct slave {
 	struct delayed_work notify_work;
 	struct kobject kobj;
 	struct rtnl_link_stats64 slave_stats;
+#ifdef CONFIG_XFRM_OFFLOAD
+	struct list_head ipsec_offloads;
+#endif
 };
 
 static inline struct slave *to_slave(struct kobject *kobj)
@@ -206,11 +223,6 @@ struct bond_up_slave {
  */
 #define BOND_LINK_NOCHANGE -1
 
-struct bond_ipsec {
-	struct list_head list;
-	struct xfrm_state *xs;
-};
-
 /*
  * Here are the locking policies for the two bonding locks:
  * Get rcu_read_lock when reading or RTNL when writing slave list.
-- 
2.45.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ