[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aTJWI3aybYO-NHg5@fedora>
Date: Fri, 5 Dec 2025 03:48:51 +0000
From: Hangbin Liu <liuhangbin@...il.com>
To: Cosmin Ratiu <cratiu@...dia.com>
Cc: netdev@...r.kernel.org, Jay Vosburgh <jv@...sburgh.net>,
Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, Taehee Yoo <ap420073@...il.com>,
Jianbo Liu <jianbol@...dia.com>,
Sabrina Dubroca <sd@...asysnail.net>,
Steffen Klassert <steffen.klassert@...unet.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
Leon Romanovsky <leonro@...dia.com>
Subject: Re: [RFC PATCH ipsec 2/2] bonding: Maintain offloaded xfrm on all
devices
On Fri, Nov 21, 2025 at 05:16:44PM +0200, Cosmin Ratiu wrote:
> The bonding driver manages offloaded SAs using the following strategy:
>
> An xfrm_state offloaded on the bond device with bond_ipsec_add_sa() uses
> 'real_dev' on the xfrm_state xs to redirect the offload to the current
> active slave. The corresponding bond_ipsec_del_sa() (called with the xs
> spinlock held) redirects the unoffload call to real_dev. Finally,
> cleanup happens in bond_ipsec_free_sa(), which removes the offload from
> the device. Since the last call happens without the xs spinlock held,
> that is where the real work to unoffload actually happens.
>
> When the active slave changes to a new device a 3-step process is used
> to migrate all xfrm states to the new device:
> 1. bond_ipsec_del_sa_all() unoffloads all states in bond->ipsec_list
> from the previously active device.
> 2. The active slave is flipped to the new device.
> 3. bond_ipsec_add_sa_all() offloads all states in bond->ipsec_list to
> the new device.
>
> There can be two races which result in unencrypted IPSec packets being
> transmitted on the wire:
>
> 1. Unencrypted IPSec packet on old_dev:
> CPU1 (xfrm_output) CPU2 (bond_change_active_slave)
> bond_ipsec_offload_ok -> true
> bond_ipsec_del_sa_all
> bond_xmit_activebackup
> bond_dev_queue_xmit
> dev_queue_xmit on old_dev
> bond->curr_active_slave = new_dev
> bond_ipsec_add_sa_all
>
> 2. Unencrypted IPSec packet on new_dev:
> CPU1 (xfrm_output) CPU2 (bond_change_active_slave)
> bond_ipsec_offload_ok -> true
> bond->curr_active_slave = new_dev
> bond_ipsec_migrate_sa_all
> bond_xmit_activebackup
> bond_dev_queue_xmit
> dev_queue_xmit on new_dev
> bond_ipsec_migrate_sa_all finishes
>
> This patch fixes both these issues. Bonding now maintain SAs on all
> devices by making use of the previous patch that allows the same xfrm
> state to be offloaded on multiple devices. This consists of:
>
> 1. Maintaining two linked lists:
> - bond->ipsec_list is the list of xfrm states offloaded to the bonding
> device.
> - Each slave has its own bond->ipsec_offloads list holding offloads of
> bond->ipsec_list on that slave.
> These lists are protected by the existing bond->ipsec_lock mutex.
>
> 2. When a slave is added (bond_enslave), bond_ipsec_add_sa_all now
> offloads all xfrm states to the new device.
>
> 3. When a slave is removed (__bond_release_one), bond_ipsec_del_sa_all
> now removes all xfrm state offloads from that device.
>
> 4. When the active slave is changed (bond_change_active_slave), a new
> bond_ipsec_migrate_sa_all function switches xs->xso.real_dev and
> xs->xso.offload handle for all offloaded xfrm states.
> xdo_dev_state_advance_esn is also called on the new device to update
> the esn state.
>
> 5. Adding an offloaded xfrm state to the bond device must now iterate
> through active slaves. To make that nice, RTNL is grabbed there. The
> alternative is repeatedly grabbing each slave under the RCU lock,
> holding it, releasing the lock to be able to offload a state, then
> re-grabbing the RCU lock and releasing the slave. RTNL seems cleaner.
>
> 6. bond_ipsec_del_sa (.xdo_dev_state_delete for bond) is unchanged, it
> now only deletes the state from the active device and leaves the rest
> for the xdo_dev_state_free callback, which can grab the required
> locks.
>
> Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
> Signed-off-by: Cosmin Ratiu <cratiu@...dia.com>
> ---
> drivers/net/bonding/bond_main.c | 283 +++++++++++++++++---------------
> include/net/bonding.h | 22 ++-
> 2 files changed, 164 insertions(+), 141 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 4c5b73786877..979e5aabf8d2 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -452,6 +452,61 @@ static struct net_device *bond_ipsec_dev(struct xfrm_state *xs)
> return slave->dev;
> }
>
> +static struct bond_ipsec_offload*
> +bond_ipsec_dev_add_sa(struct net_device *dev, struct bond_ipsec *ipsec,
> + struct netlink_ext_ack *extack)
> +{
> + struct bond_ipsec_offload *offload;
> + int err;
> +
> + if (!dev->xfrmdev_ops ||
> + !dev->xfrmdev_ops->xdo_dev_state_add ||
> + netif_is_bond_master(dev)) {
> + NL_SET_ERR_MSG_MOD(extack,
> + "Slave does not support ipsec offload");
> + return ERR_PTR(-EINVAL);
> + }
> +
> + offload = kzalloc(sizeof(*offload), GFP_KERNEL);
> + if (!offload)
> + return ERR_PTR(-ENOMEM);
> +
> + offload->ipsec = ipsec;
> + offload->dev = dev;
> + err = dev->xfrmdev_ops->xdo_dev_state_add(dev, ipsec->xs,
> + &offload->handle, extack);
> + if (err)
Here we need to free the offload.
> + return ERR_PTR(err);
> + return offload;
> +}
> +
> +static void bond_ipsec_dev_del_sa(struct bond_ipsec_offload *offload)
> +{
> + struct xfrm_state *xs = offload->ipsec->xs;
> + struct net_device *dev = offload->dev;
> +
> + if (dev->xfrmdev_ops->xdo_dev_state_delete) {
> + spin_lock_bh(&xs->lock);
> + /* Don't double delete states killed by the user
> + * from xs->xso.real_dev.
> + */
> + if (dev != xs->xso.real_dev ||
> + xs->km.state != XFRM_STATE_DEAD)
> + dev->xfrmdev_ops->xdo_dev_state_delete(dev, xs,
> + offload->handle);
> + if (xs->xso.real_dev == dev)
> + xs->xso.real_dev = NULL;
> + spin_unlock_bh(&xs->lock);
> + }
> +
> + if (dev->xfrmdev_ops->xdo_dev_state_free)
> + dev->xfrmdev_ops->xdo_dev_state_free(dev, xs, offload->handle);
> +
> + list_del(&offload->list);
> + list_del(&offload->ipsec_list);
> + kfree(offload);
> +}
> +
[...]
> -static void bond_ipsec_add_sa_all(struct bonding *bond)
> +static void bond_ipsec_add_sa_all(struct bonding *bond, struct slave *new_slave,
> + struct netlink_ext_ack *extack)
> {
> + struct net_device *real_dev = new_slave->dev;
> struct net_device *bond_dev = bond->dev;
> - struct net_device *real_dev;
> struct bond_ipsec *ipsec;
> struct slave *slave;
>
> - slave = rtnl_dereference(bond->curr_active_slave);
> - real_dev = slave ? slave->dev : NULL;
> - if (!real_dev)
> - return;
> + INIT_LIST_HEAD(&new_slave->ipsec_offloads);
>
> mutex_lock(&bond->ipsec_lock);
> - if (!real_dev->xfrmdev_ops ||
> - !real_dev->xfrmdev_ops->xdo_dev_state_add ||
> - netif_is_bond_master(real_dev)) {
> - if (!list_empty(&bond->ipsec_list))
> - slave_warn(bond_dev, real_dev,
> - "%s: no slave xdo_dev_state_add\n",
> - __func__);
> - goto out;
> - }
> -
> list_for_each_entry(ipsec, &bond->ipsec_list, list) {
> + struct bond_ipsec_offload *offload;
> struct xfrm_state *xs = ipsec->xs;
>
> - /* If new state is added before ipsec_lock acquired */
> - if (xs->xso.real_dev == real_dev)
> - continue;
> -
> - if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs,
> - &xs->xso.offload_handle,
> - NULL)) {
> + offload = bond_ipsec_dev_add_sa(slave->dev, ipsec, extack);
Here should be real_dev.
> + if (IS_ERR(offload)) {
> slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
> continue;
> }
If we add offload failed on the slave, what would happen during migrate?
Thanks
Hangbin
Powered by blists - more mailing lists