[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260102140030.32367-1-fw@strlen.de>
Date: Fri, 2 Jan 2026 15:00:07 +0100
From: Florian Westphal <fw@...len.de>
To: <netdev@...r.kernel.org>
Cc: Paolo Abeni <pabeni@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
dsahern@...nel.org,
Florian Westphal <fw@...len.de>,
syzbot+4393c47753b7808dac7d@...kaller.appspotmail.com
Subject: [PATCH net] inet: frags: drop fraglist conntrack references
Jakub added a warning in nf_conntrack_cleanup_net_list() to make debugging
leaked skbs/conntrack references more obvious.
syzbot reports this as triggering, and I can also reproduce this via
ip_defrag.sh selftest:
conntrack cleanup blocked for 60s
WARNING: net/netfilter/nf_conntrack_core.c:2512
[..]
conntrack clenups gets stuck because there are skbs with still hold nf_conn
references via their frag_list.
net.core.skb_defer_max=0 makes the hang disappear.
Eric Dumazet points out that skb_release_head_state() doesn't follow the
fraglist.
ip_defrag.sh can only reproduce this problem since
commit 6471658dc66c ("udp: use skb_attempt_defer_free()"), but AFAICS this
problem could happen with TCP as well if pmtu discovery is off.
The relevant problem path for udp is:
1. netns emits fragmented packets
2. nf_defrag_v6_hook reassembles them (in output hook)
3. reassembled skb is tracked (skb owns nf_conn reference)
4. ip6_output refragments
5. refragmented packets also own nf_conn reference (ip6_fragment
calls ip6_copy_metadata())
6. on input path, nf_defrag_v6_hook skips defragmentation: the
fragments already have skb->nf_conn attached
7. skbs are reassembled via ipv6_frag_rcv()
8. skb_consume_udp -> skb_attempt_defer_free() -> skb ends up
in pcpu freelist, but still has nf_conn reference.
Possible solutions:
1 let defrag engine drop nf_conn entry, OR
2 export kick_defer_list_purge() and call it from the conntrack
netns exit callback, OR
3 add skb_has_frag_list() check to skb_attempt_defer_free()
2 & 3 also solve ip_defrag.sh hang but share same drawback:
Such reassembled skbs, queued to socket, can prevent conntrack module
removal until userspace has consumed the packet. While both tcp and udp
stack do call nf_reset_ct() before placing skb on socket queue, that
function doesn't iterate frag_list skbs.
Therefore drop nf_conn entries when they are placed in defrag queue.
Keep the nf_conn entry of the first (offset 0) skb so that reassembled
skb retains nf_conn entry for sake of TX path.
Note that fixes tag is incorrect; it points to the commit introducing the
'ip_defrag.sh reproducible problem': no need to backport this patch to
every stable kernel.
Reported-by: syzbot+4393c47753b7808dac7d@...kaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/693b0fa7.050a0220.4004e.040d.GAE@google.com/
Fixes: 6471658dc66c ("udp: use skb_attempt_defer_free()")
Signed-off-by: Florian Westphal <fw@...len.de>
---
net/ipv4/inet_fragment.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 001ee5c4d962..4e6d7467ed44 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -488,6 +488,8 @@ int inet_frag_queue_insert(struct inet_frag_queue *q, struct sk_buff *skb,
}
FRAG_CB(skb)->ip_defrag_offset = offset;
+ if (offset)
+ nf_reset_ct(skb);
return IPFRAG_OK;
}
--
2.52.0
Powered by blists - more mailing lists