[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250523075611.3723340-7-steffen.klassert@secunet.com>
Date: Fri, 23 May 2025 09:56:05 +0200
From: Steffen Klassert <steffen.klassert@...unet.com>
To: David Miller <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>
CC: Herbert Xu <herbert@...dor.apana.org.au>, Steffen Klassert
<steffen.klassert@...unet.com>, <netdev@...r.kernel.org>
Subject: [PATCH 06/12] bonding: Mark active offloaded xfrm_states
From: Cosmin Ratiu <cratiu@...dia.com>
When the active link is changed for a bond device, the existing xfrm
states need to be migrated over to the new link. This is done with:
- bond_ipsec_del_sa_all() goes through the offloaded states list and
removes all of them from hw.
- bond_ipsec_add_sa_all() re-offloads all states to the new device.
But because the offload status of xfrm states isn't marked in any way,
there can be bugs.
When all bond links are down, bond_ipsec_del_sa_all() unoffloads
everything from the previous active link. If the same link then comes
back up, nothing gets reoffloaded by bond_ipsec_add_sa_all().
This results in a stack trace like this a bit later when user space
removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to
remove a rule that's no longer offloaded:
[] Call Trace:
[] <TASK>
[] ? __warn+0x7d/0x110
[] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
[] ? report_bug+0x16d/0x180
[] ? handle_bug+0x4f/0x90
[] ? exc_invalid_op+0x14/0x70
[] ? asm_exc_invalid_op+0x16/0x20
[] ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core]
[] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
[] bond_ipsec_del_sa+0x1ab/0x200 [bonding]
[] xfrm_dev_state_delete+0x1f/0x60
[] __xfrm_state_delete+0x196/0x200
[] xfrm_state_delete+0x21/0x40
[] xfrm_del_sa+0x69/0x110
[] xfrm_user_rcv_msg+0x11d/0x300
[] ? release_pages+0xca/0x140
[] ? copy_to_user_tmpl.part.0+0x110/0x110
[] netlink_rcv_skb+0x54/0x100
[] xfrm_netlink_rcv+0x31/0x40
[] netlink_unicast+0x1fc/0x2d0
[] netlink_sendmsg+0x1e4/0x410
[] __sock_sendmsg+0x38/0x60
[] sock_write_iter+0x94/0xf0
[] vfs_write+0x338/0x3f0
[] ksys_write+0xba/0xd0
[] do_syscall_64+0x4c/0x100
[] entry_SYSCALL_64_after_hwframe+0x4b/0x53
There's also another theoretical bug:
Calling bond_ipsec_del_sa_all() multiple times can result in corruption
in the driver implementation if the double-free isn't tolerated. This
isn't nice.
Before the "Fixes" commit, xs->xso.real_dev was set to NULL when an xfrm
state was unoffloaded from a device, but a race with netdevsim's
.xdo_dev_offload_ok() accessing real_dev was considered a sufficient
reason to not set real_dev to NULL anymore. This unfortunately
introduced the new bugs.
Since .xdo_dev_offload_ok() was significantly refactored by [1] and
there are no more users in the stack of xso.real_dev, that
race is now gone and xs->xso.real_dev can now once again be used to
represent which device (if any) currently holds the offloaded rule.
Go one step further and set real_dev after add/before delete calls, to
catch any future driver misuses of real_dev.
[1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/
Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference")
Signed-off-by: Cosmin Ratiu <cratiu@...dia.com>
Reviewed-by: Leon Romanovsky <leonro@...dia.com>
Reviewed-by: Nikolay Aleksandrov <razor@...ckwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@...il.com>
Tested-by: Hangbin Liu <liuhangbin@...il.com>
Signed-off-by: Steffen Klassert <steffen.klassert@...unet.com>
---
drivers/net/bonding/bond_main.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4ba525a564c5..14f7c9712ad4 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -496,9 +496,9 @@ static int bond_ipsec_add_sa(struct net_device *bond_dev,
goto out;
}
- xs->xso.real_dev = real_dev;
err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack);
if (!err) {
+ xs->xso.real_dev = real_dev;
ipsec->xs = xs;
INIT_LIST_HEAD(&ipsec->list);
mutex_lock(&bond->ipsec_lock);
@@ -540,12 +540,12 @@ static void bond_ipsec_add_sa_all(struct bonding *bond)
if (ipsec->xs->xso.real_dev == real_dev)
continue;
- ipsec->xs->xso.real_dev = real_dev;
if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev,
ipsec->xs, NULL)) {
slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);
- ipsec->xs->xso.real_dev = NULL;
+ continue;
}
+ ipsec->xs->xso.real_dev = real_dev;
}
out:
mutex_unlock(&bond->ipsec_lock);
@@ -629,6 +629,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond)
__func__);
continue;
}
+ ipsec->xs->xso.real_dev = NULL;
real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
ipsec->xs);
if (real_dev->xfrmdev_ops->xdo_dev_state_free)
@@ -664,6 +665,7 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev,
WARN_ON(xs->xso.real_dev != real_dev);
+ xs->xso.real_dev = NULL;
if (real_dev && real_dev->xfrmdev_ops &&
real_dev->xfrmdev_ops->xdo_dev_state_free)
real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
--
2.34.1
Powered by blists - more mailing lists