[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <879a4592e4e4bd0c30dbe29ca189e224ec1739a5.1739201151.git.sd@queasysnail.net>
Date: Mon, 10 Feb 2025 17:01:44 +0100
From: Sabrina Dubroca <sd@...asysnail.net>
To: netdev@...r.kernel.org
Cc: Sabrina Dubroca <sd@...asysnail.net>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Neal Cardwell <ncardwell@...gle.com>,
Kuniyuki Iwashima <kuniyu@...zon.com>,
David Ahern <dsahern@...nel.org>,
Xiumei Mu <xmu@...hat.com>
Subject: [PATCH net] tcp: drop skb extensions before skb_attempt_defer_free
Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while
running tests that boil down to:
- create a pair of netns
- run a basic TCP test over ipcomp6
- delete the pair of netns
The xfrm_state found on spi_byaddr was not deleted at the time we
delete the netns, because we still have a reference on it. This
lingering reference comes from a secpath (which holds a ref on the
xfrm_state), which is still attached to an skb. This skb is not
leaked, it ends up on sk_receive_queue and then gets defer-free'd by
skb_attempt_defer_free.
The problem happens when we defer freeing an skb (push it on one CPU's
defer_list), and don't flush that list before the netns is deleted. In
that case, we still have a reference on the xfrm_state that we don't
expect at this point.
tcp_eat_recv_skb is currently the only caller of skb_attempt_defer_free,
so I'm fixing it here. This patch also adds a DEBUG_NET_WARN_ON_ONCE
in skb_attempt_defer_free, to make sure we don't re-introduce this
problem.
Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Reported-by: Xiumei Mu <xmu@...hat.com>
Signed-off-by: Sabrina Dubroca <sd@...asysnail.net>
---
A few comments:
- AFAICT this could not happen before 68822bdf76f1, since we would
have emptied the (per-socket) defer_list before getting to ->exit()
for the netns
- I thought about dropping the extensions at the same time as we
already drop the dst, but Paolo said this is probably not correct due
to IP_CMSG_PASSSEC
- I'm planning to rework the "synchronous" removal of xfrm_states
(commit f75a2804da39 ("xfrm: destroy xfrm_state synchronously on
net exit path")), which may also be able to fix this problem, but
it is going to be a lot more complex than this patch
net/core/skbuff.c | 1 +
net/ipv4/tcp.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6a99c453397f..abd0371bc51a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -7047,6 +7047,7 @@ nodefer: kfree_skb_napi_cache(skb);
DEBUG_NET_WARN_ON_ONCE(skb_dst(skb));
DEBUG_NET_WARN_ON_ONCE(skb->destructor);
+ DEBUG_NET_WARN_ON_ONCE(skb_has_extensions(skb));
sd = &per_cpu(softnet_data, cpu);
defer_max = READ_ONCE(net_hotdata.sysctl_skb_defer_max);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0d704bda6c41..e60f642485ee 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1524,6 +1524,7 @@ static void tcp_eat_recv_skb(struct sock *sk, struct sk_buff *skb)
sock_rfree(skb);
skb->destructor = NULL;
skb->sk = NULL;
+ skb_ext_reset(skb);
return skb_attempt_defer_free(skb);
}
__kfree_skb(skb);
--
2.48.1
Powered by blists - more mailing lists