[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.30b0807bf46c0@gmail.com>
Date: Sat, 17 Jan 2026 21:28:05 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Shiming Cheng (成诗明) <Shiming.Cheng@...iatek.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kuba@...nel.org" <kuba@...nel.org>,
"willemb@...gle.com" <willemb@...gle.com>,
"pabeni@...hat.com" <pabeni@...hat.com>,
"willemdebruijn.kernel@...il.com" <willemdebruijn.kernel@...il.com>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"matthias.bgg@...il.com" <matthias.bgg@...il.com>,
"davem@...emloft.net" <davem@...emloft.net>
Cc: Lena Wang (王娜) <Lena.Wang@...iatek.com>
Subject: Re: [PATCH] net: fix udp gso skb_segment after pull from frag_list
Shiming Cheng (成诗明) wrote:
> Dear Willem
>
> BUG_ON(!list_skb->head_frag); <<<<<<<< an exception was triggered
> here.
>
> trigger exception condition:
> 1. TX device scatter-gather: on
> dev features=18433 => 100100000000001 => last bit is 1 == SG enable
> 2. skb->head_frag is 1
> 3. udp gro enable
>
>
> Because the target device has enabled SG (scatter-gather), and skb-
> >head_frag is 0, the subsequent head cannot be added to frags.
> In other words, when head_frag is 0, the target device is not allowed
> to enable SG (scatter-gather).
> I still need to reproduce the problem, but from analyzing the code, it
Do you mean that this happened somewhere (e.g., in production), but
you are not able to trigger it in a synthetic test yet?
(btw small reminder: please don't top post)
> appears that the issue is caused by the target device enabling SG.
SG is the default. It makes sense that disabling SG disables
skb frags and avoids this part of the code.
>
> struct sk_buff *skb_segment(struct sk_buff *head_skb,
>
> netdev_features_t features)
> {
> ...
> while (pos < offset + len) {
> if (i >= nfrags) {
> i = 0;
> nfrags = skb_shinfo(list_skb)-
> >nr_frags;
> frag = skb_shinfo(list_skb)->frags;
> frag_skb = list_skb;
> if (!skb_headlen(list_skb)) {
> BUG_ON(!nfrags);
> } else {
> BUG_ON(!list_skb-
> >head_frag); <<<<<<<< an exception was triggered here.
>
> /* to make room for head_frag.
> */
> i--;
> frag--;
Thanks. So commit 13acc94eff12 ("net: permit skb_segment on head_frag
frag_list skb") extended support for segmenting of list_skb with
frag_list, but only if the head if present is also frag_based.
This is obsure enough that I agree that linearizing such packets is an
okay workaround.
But we may be able to do one better. Disabling sg for this input will
build linear nskbs and avoid this code. At the top of skb_segment.
And such code already exists, refined most recently in commit
9e4b7a99a03a ("net: gso: fix panic on frag_list with mixed head alloc
types"). Did you observe this on a recent kernel? If not, which
version?
What kind of BPF program are you running, especially which functions
does it call that might change skb geometry. All such functions in
net/core/filter. set SKB_GSO_DODGY, which will make skb_segment enter
this check branch.
Re: your patch
+ } else if (skb_shinfo(gso_skb)->frag_list && gso_skb->head_frag == 0) {
The issue here is that list_skb->head_frag == 0, not gso_skb. Not sure
that gso_skb->head_frag == 0 implies list_skb->head_frag == 0.
+ if (skb_pagelen(gso_skb) - sizeof(*uh) != skb_shinfo(gso_skb)->gso_size) {
This is copied but AFAIK not relevant here.
> }
> if (skb_orphan_frags(frag_skb,
> GFP_ATOMIC) ||
> skb_zerocopy_clone(nskb, frag_skb,
> GFP_ATOMIC))
> goto err;
>
> list_skb = list_skb->next;
> }
>
> }
>
> #0 0xffffffa17e5b4184 in skb_segment (head_skb=<optimized out>,
> features=<optimized out>)
> at ../../../../../../kernel-4.19/net/core/skbuff.c:3774
> #1 0xffffffa17e68a4b0 in __udp_gso_segment
> (gso_skb=0xffffffde2973fe00, features=18433)
> at ../../../../../../kernel-4.19/net/ipv4/udp_offload.c:215
> #2 0xffffffa17e68ad90 in udp4_ufo_fragment (skb=0xffffffde2973fe00,
> features=18433)
> at ../../../../../../kernel-4.19/net/ipv4/udp_offload.c:316
> #3 0xffffffa17e694e04 in inet_gso_segment (skb=0xffffffde2973fe00,
> features=18433)
> at ../../../../../../kernel-4.19/net/ipv4/af_inet.c:1342
> #4 0xffffffa17e5c4b48 in skb_mac_gso_segment (skb=0xffffffde2973fe00,
> features=18433)
>
>
> On Wed, 2026-01-07 at 11:00 -0500, Willem de Bruijn wrote:
> > External email : Please do not click links or open attachments until
> > you have verified the sender or the content.
> >
> >
> > Shiming Cheng (成诗明) wrote:
> > > Dear Willem
> > >
> > > frag_list assignment trigger:
> > > frag_list is assigned to the next skb without to enable device
> > > feature
> > > NETIF_F_GRO_FRAGLIST if head_frag is zero.
> >
> > Thanks for the trace, so this is a regular GRO packet using
> > skb_gro_receive that ended up having to use frag_list.
> >
> > > After packet in fraglist is processed by BPF pull tial, performing
> > > GSO
> > > segmentation directly will cause problems.
> >
> > That's peculiar. For such packets, there is no expectation that all
> > frag_list skbs hold an exact segment.
> >
> > It would be helpful if the stack trace can be extended to show the
> > line numbers. And/or if you can show the BUG that is being hit in
> > skb_segment.
> >
> > I agree that we probably just need to detect such modified packets
> > and
> > linearize them. But let's get a more detailed root cause first.
> >
> > >
> > > udp_gro_receive_segment
> > > skb_gro_receive
> > > if (skb->head_frag==0)
> > > goto merge;
> > >
> > >
> > > merge:
> > > /* sk ownership - if any - completely transferred to the
> > > aggregated packet */
> > > skb->destructor = NULL;
> > > skb->sk = NULL;
> > > delta_truesize = skb->truesize;
> > > if (offset > headlen) {
> > > unsigned int eat = offset - headlen;
> > >
> > > skb_frag_off_add(&skbinfo->frags[0], eat);
> > > skb_frag_size_sub(&skbinfo->frags[0], eat);
> > > skb->data_len -= eat;
> > > skb->len -= eat;
> > > offset = headlen;
> > > }
> > >
> > > __skb_pull(skb, offset);
> > >
> > > if (NAPI_GRO_CB(p)->last == p)
> > > skb_shinfo(p)->frag_list = skb;
> > > <<< here frag_list is assigned to the next skb without to
> > > enable device feature NETIF_F_GRO_FRAGLIST if head_frag is zero
> > > else
> > > NAPI_GRO_CB(p)->last->next = skb;
> > > NAPI_GRO_CB(p)->last = skb;
> > > __skb_header_release(skb);
> > > lp = p;
> > >
> > >
> > >
> > >
> > > On Tue, 2026-01-06 at 14:15 -0500, Willem de Bruijn wrote:
> > > > External email : Please do not click links or open attachments
> > > > until
> > > > you have verified the sender or the content.
> > > >
> > > >
> > > > Shiming Cheng wrote:
> > > > > Commit 3382a1ed7f77 ("net: fix udp gso skb_segment after pull
> > > > > from
> > > > > frag_list")
> > > > > if gso_type is not SKB_GSO_FRAGLIST but skb->head_frag is zero,
> > > >
> > > > What codepath triggers this scenario?
> > > >
> > > > We should make sure that the fix covers all such instances.
> > > > Likely
> > > > instances of where some module in the datapath, like a BPF
> > > > program,
> > > > modifies a valid skb into one that is not safe to pass to
> > > > skb_segment.
> > > >
> > > > I don't fully understand yet that skb->head_frag == 0 is the only
> > > > such condition in scope.
> > > >
> > > > > then detected invalid geometry in frag_list skbs and call
> > > > > skb_segment. But some packets with modified geometry can also
> > > > > hit
> > > > > bugs in that code. Instead, linearize all these packets that
> > > > > fail
> > > > > the basic invariants on gso fraglist skbs. That is more robust.
> > > > > call stack information, see below.
> > > > >
> > > > > Valid SKB_GSO_FRAGLIST skbs
> > > > > - consist of two or more segments
> > > > > - the head_skb holds the protocol headers plus first gso_size
> > > > > - one or more frag_list skbs hold exactly one segment
> > > > > - all but the last must be gso_size
> > > > >
> > > > > Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data)
> > > > > can
> > > > > modify fraglist skbs, breaking these invariants.
> > > > >
> > > > > In extreme cases they pull one part of data into skb linear.
> > > > > For
> > > > > UDP,
> > > > > this causes three payloads with lengths of (11,11,10) bytes
> > > > > were
> > > > > pulled tail to become (12,10,10) bytes.
> > > > >
> > > > > The skbs no longer meets the above SKB_GSO_FRAGLIST conditions
> > > > > because
> > > > > payload was pulled into head_skb, it needs to be linearized
> > > > > before
> > > > > pass
> > > > > to regular skb_segment.
> > > >
> > > > Most of this commit message duplicates the text in commit
> > > > 3382a1ed7f77
> > > > ("net: fix udp gso skb_segment after pull from frag_list"). And
> > > > somewhat garbles it, as in the first sentence.
> > > >
> > > > But this is a different datapath, not related to
> > > > SKB_GSO_FRAGLIST.
> > > > So the fixes tag is also incorrect. The blamed commit fixes an
> > > > issue
> > > > with fraglist GRO. This new issue is with skbs that have a
> > > > fraglist,
> > > > but not one created with that feature. (the naming is confusing,
> > > > but
> > > > fraglist-gro is only one use of the skb frag_list).
> > > >
> > > > > skb_segment+0xcd0/0xd14
> > > > > __udp_gso_segment+0x334/0x5f4
> > > > > udp4_ufo_fragment+0x118/0x15c
> > > > > inet_gso_segment+0x164/0x338
> > > > > skb_mac_gso_segment+0xc4/0x13c
> > > > > __skb_gso_segment+0xc4/0x124
> > > > > validate_xmit_skb+0x9c/0x2c0
> > > > > validate_xmit_skb_list+0x4c/0x80
> > > > > sch_direct_xmit+0x70/0x404
> > > > > __dev_queue_xmit+0x64c/0xe5c
> > > > > neigh_resolve_output+0x178/0x1c4
> > > > > ip_finish_output2+0x37c/0x47c
> > > > > __ip_finish_output+0x194/0x240
> > > > > ip_finish_output+0x20/0xf4
> > > > > ip_output+0x100/0x1a0
> > > > > NF_HOOK+0xc4/0x16c
> > > > > ip_forward+0x314/0x32c
> > > > > ip_rcv+0x90/0x118
> > > > > __netif_receive_skb+0x74/0x124
> > > > > process_backlog+0xe8/0x1a4
> > > > > __napi_poll+0x5c/0x1f8
> > > > > net_rx_action+0x154/0x314
> > > > > handle_softirqs+0x154/0x4b8
> > > > >
> > > > > [118.376811] [C201134] rxq0_pus: [name:bug&]kernel BUG at
> > > > > net/core/skbuff.c:4278!
> > > > > [118.376829] [C201134] rxq0_pus: [name:traps&]Internal error:
> > > > > Oops - BUG: 00000000f2000800 [#1]
> > > > > [118.470774] [C201134] rxq0_pus: [name:mrdump&]Kernel Offset:
> > > > > 0x178cc00000 from 0xffffffc008000000
> > > > > [118.470810] [C201134] rxq0_pus: [name:mrdump&]PHYS_OFFSET:
> > > > > 0x40000000
> > > > > [118.470827] [C201134] rxq0_pus: [name:mrdump&]pstate:
> > > > > 60400005
> > > > > (nZCv daif +PAN -UAO)
> > > > > [118.470848] [C201134] rxq0_pus: [name:mrdump&]pc :
> > > > > [0xffffffd79598aefc] skb_segment+0xcd0/0xd14
> > > > > [118.470900] [C201134] rxq0_pus: [name:mrdump&]lr :
> > > > > [0xffffffd79598a5e8] skb_segment+0x3bc/0xd14
> > > > > [118.470928] [C201134] rxq0_pus: [name:mrdump&]sp :
> > > > > ffffffc008013770
> > > > >
> > > > > Fixes: 3382a1ed7f77 ("net: fix udp gso skb_segment after pull
> > > > > from
> > > > > frag_list")
> > > > > Signed-off-by: Shiming Cheng <shiming.cheng@...iatek.com>
> > > > > ---
> > > > > net/ipv4/udp_offload.c | 6 ++++++
> > > > > 1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > > > > index 19d0b5b09ffa..606d9ce8c98e 100644
> > > > > --- a/net/ipv4/udp_offload.c
> > > > > +++ b/net/ipv4/udp_offload.c
> > > > > @@ -535,6 +535,12 @@ struct sk_buff *__udp_gso_segment(struct
> > > > > sk_buff *gso_skb,
> > > > > uh->check = ~udp_v4_check(gso_skb->len,
> > > > > ip_hdr(gso_skb)
> > > > > -
> > > > > > saddr,
> > > > >
> > > > > ip_hdr(gso_skb)
> > > > > -
> > > > > > daddr, 0);
> > > > >
> > > > > + } else if (skb_shinfo(gso_skb)->frag_list && gso_skb-
> > > > > > head_frag == 0) {
> > > > >
> > > > > + if (skb_pagelen(gso_skb) - sizeof(*uh) !=
> > > > > skb_shinfo(gso_skb)->gso_size) {
> > > > > + ret = __skb_linearize(gso_skb);
> > > > > + if (ret)
> > > > > + return ERR_PTR(ret);
> > > > > + }
> > > > > }
> > > > >
> > > > > skb_pull(gso_skb, sizeof(*uh));
> > > > > --
> > > > > 2.45.2
> > > > >
> > > >
> > > >
> >
> >
Powered by blists - more mailing lists