[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <218c929c-b2a2-4965-861b-5733a265acfa@redhat.com>
Date: Wed, 31 Dec 2025 05:01:11 +0200
From: mohammad heib <mheib@...hat.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>, netdev@...r.kernel.org
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, horms@...nel.org, kernelxing@...cent.com,
kuniyu@...gle.com, steffen.klassert@...unet.com, atenart@...nel.org
Subject: Re: [PATCH net] net: skbuff: fix truesize and head state corruption
in skb_segment_list
Hi Willem,
Thank you for the review and for tracing the history of the accounting
logic.
On 12/30/25 6:31 PM, Willem de Bruijn wrote:
> mheib@ wrote:
>> From: Mohammad Heib <mheib@...hat.com>
>>
>> When skb_segment_list is called during packet forwarding through
>> a bridge or VXLAN, it assumes that every fragment in a frag_list
>> carries its own socket ownership and head state. While this is true for
>> GSO packets created by the transmit path (via __ip_append_data), it is
>> not true for packets built by the GRO receive path.
>
> We have to separate packets that use frag_list, a broader category,
> from those that use fraglist gso chaining. This code path is only
> exercised by the latter.
That makes perfect sense. I will remove the redundant is_flist check
since this path is only reached for fraglist GSO, and I'll update the
Fixes tag to ed4cccef64c1 in v2.
>
>> In the GRO path, fragments are "orphans" (skb->sk == NULL) and were
>> never charged to a socket. However, the current logic in
>> skb_segment_list unconditionally adds every fragment's truesize to
>> delta_truesize and subsequently subtracts this from the parent SKB.
>
> This was not present in the original fraglist chaining patch cited in
> the Fixes tag. It was added in commit ed4cccef64c1 ("gro: fix
> ownership transfer"). Which was a follow-on to commit 5e10da5385d2
> ("skbuff: allow 'slow_gro' for skb carring sock reference") removing
> the skb->destructor reference.
>
>> This results a memory accounting leak, Since GRO fragments were never
>> charged to the socket in the first place, the "refund" results in the
>> parent SKB returning less memory than originally charged when it is
>> finally freed. This leads to a permanent leak in sk_wmem_alloc, which
>> prevents the socket from being destroyed, resulting in a persistent memory
>> leak of the socket object and its related metadata.
>>
>> The leak can be observed via KMEMLEAK when tearing down the networking
>> environment:
>>
>> unreferenced object 0xffff8881e6eb9100 (size 2048):
>> comm "ping", pid 6720, jiffies 4295492526
>> backtrace:
>> kmem_cache_alloc_noprof+0x5c6/0x800
>> sk_prot_alloc+0x5b/0x220
>> sk_alloc+0x35/0xa00
>> inet6_create.part.0+0x303/0x10d0
>> __sock_create+0x248/0x640
>> __sys_socket+0x11b/0x1d0
>>
>> This patch modifies skb_segment_list to only perform head state release
>> and truesize subtraction if the fragment explicitly owns a socket
>> reference. For GRO-forwarded packets where fragments are not owners,
>> the parent maintains the full truesize and acts as the single anchor for
>> the memory refund upon destruction.
>
> Thanks for the report and fix. It can probably be simplified a bit
> based on knowledge that only fraglist chaining skbs reach this path.
> And the Fixes tag should reflect the patch that changed this
> accounting in the GRO patch. Matching that in the GSO path makes
> sense.
>
>> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.")
>> Signed-off-by: Mohammad Heib <mheib@...hat.com>
>> ---
>> net/core/skbuff.c | 18 ++++++++++++++++--
>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index a00808f7be6a..aee9be42409b 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -4641,6 +4641,7 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>> struct sk_buff *tail = NULL;
>> struct sk_buff *nskb, *tmp;
>> int len_diff, err;
>> + bool is_flist = !!(skb_shinfo(skb)->gso_type & SKB_GSO_FRAGLIST);
>
> This is guaranteed when entering skb_segment_list.
>
>>
>> skb_push(skb, -skb_network_offset(skb) + offset);
>>
>> @@ -4656,7 +4657,15 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>> list_skb = list_skb->next;
>>
>> err = 0;
>> - delta_truesize += nskb->truesize;
>> +
>> + /* Only track truesize delta if the fragment is being orphaned.
>> + * In the GRO path, fragments don't have a socket owner (sk=NULL),
>> + * so the parent must maintain the total truesize to prevent
>> + * memory accounting leaks.
>> + */
>> + if (!is_flist || nskb->sk)
>> + delta_truesize += nskb->truesize;
>> +
I considered combining the two if (nskb->sk) blocks into one to simplify
the code further, but I decided against it to avoid reordering the
skb_clone and skb_push logic, which felt too risky for this path.
>> if (skb_shared(nskb)) {
>> tmp = skb_clone(nskb, GFP_ATOMIC);
>> if (tmp) {
>> @@ -4684,7 +4693,12 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
>>
>> skb_push(nskb, -skb_network_offset(nskb) + offset);
>>
>> - skb_release_head_state(nskb);
>> + /* For GRO-forwarded packets, fragments have no head state
>> + * (no sk/destructor) to release. Skip this.
>> + */
>> + if (!is_flist || nskb->sk)
>> + skb_release_head_state(nskb);
>> +
>> len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb);
>> __copy_skb_header(nskb, skb);
>>
>> --
>> 2.52.0
>>
>
>
Thanks
Powered by blists - more mailing lists