[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd44c91884d0ebf3625ac85a1049679a987f8f79.camel@redhat.com>
Date: Tue, 02 Jul 2024 11:23:41 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Fred Li <dracodingfly@...il.com>, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, aleksander.lobakin@...el.com,
sashal@...nel.org, linux@...ssschuh.net, hawk@...nel.org, nbd@....name,
mkhalfella@...estorage.com, ast@...nel.org, daniel@...earbox.net,
andrii@...nel.org, martin.lau@...ux.dev, song@...nel.org,
yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org,
sdf@...gle.com, haoluo@...gle.com, jolsa@...nel.org
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH v2 1/2] net: Fix skb_segment when splitting gso_size
mangled skb having linear-headed frag_list whose head_frag=true
On Wed, 2024-06-26 at 14:55 +0800, Fred Li wrote:
> When using calico bpf based NAT, hits a kernel BUG_ON at function
> skb_segment(), line 4560. Performing NAT translation when accessing
> a Service IP across subnets, the calico will encap vxlan and calls the
> bpf_skb_adjust_room to decrease the gso_size, and then call bpf_redirect
> send packets out.
>
> 4438 struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 4439 netdev_features_t features)
> 4440 {
> 4441 struct sk_buff *segs = NULL;
> 4442 struct sk_buff *tail = NULL;
> ...
> 4558 if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
> 4559 (skb_headlen(list_skb) == len || sg)) {
> 4560 BUG_ON(skb_headlen(list_skb) > len);
> 4561
> 4562 nskb = skb_clone(list_skb, GFP_ATOMIC);
> 4563 if (unlikely(!nskb))
> 4564 goto err;
>
> call stack:
> ...
> [exception RIP: skb_segment+3016]
> RIP: ffffffffb97df2a8 RSP: ffffa3f2cce08728 RFLAGS: 00010293
> RAX: 000000000000007d RBX: 00000000fffff7b3 RCX: 0000000000000011
> RDX: 0000000000000000 RSI: ffff895ea32c76c0 RDI: 00000000000008c1
> RBP: ffffa3f2cce087f8 R8: 000000000000088f R9: 0000000000000011
> R10: 000000000000090c R11: ffff895e47e68000 R12: ffff895eb2022f00
> R13: 000000000000004b R14: ffff895ecdaf2000 R15: ffff895eb2023f00
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
> ...
>
> The skb has the following properties:
> doffset = 66
> list_skb = skb_shinfo(skb)->frag_list
> list_skb->head_frag = true
> skb->len = 2441 && skb->data_len = 2250
> skb_shinfo(skb)->nr_frags = 17
> skb_shinfo(skb)->gso_size = 75
> skb_shinfo(skb)->frags[0...16].bv_len = 125
> list_skb->len = 125
> list_skb->data_len = 0
>
> When slicing the frag_list skb, there three cases:
> 1. Only *non* head_frag
> sg will be false, only when skb_headlen(list_skb)==len is satisfied,
> it will enter the branch at line 4560, and there will be no crash.
> 2. Mixed head_frag
> sg will be false, Only when skb_headlen(list_skb)==len is satisfied,
> it will enter the branch at line 4560, and there will be no crash.
> 3. Only frag_list with head_frag=true
> sg is true, three cases below:
> (1) skb_headlen(list_skb)==len is satisfied, it will enter the branch
> at line 4560, and there will be no crash.
> (2) skb_headlen(list_skb)<len is satisfied, it will enter the branch
> at line 4560, and there will be no crash.
> (3) skb_headlen(list_skb)>len is satisfied, it will be crash.
>
> Applying this patch, three cases will be:
> 1. Only *non* head_frag
> sg will be false. No difference with before.
> 2. Mixed head_frag
> sg will be false. No difference with before.
> 3. Only frag_list with head_frag=true
> sg is true, there also three cases:
> (1) skb_headlen(list_skb)==len is satisfied, no difference with before.
> (2) skb_headlen(list_skb)<len is satisfied, will be revert to copying
> in this case.
> (3) skb_headlen(list_skb)>len is satisfied, will be revert to copying
> in this case.
>
> Since commit 13acc94eff122("net: permit skb_segment on head_frag frag_list
> skb"), it is allowed to segment the head_frag frag_list skb.
>
> Signed-off-by: Fred Li <dracodingfly@...il.com>
> ---
> net/core/skbuff.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index f0a9ef1aeaa2..b1dab1b071fc 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4556,7 +4556,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
> hsize = skb_headlen(head_skb) - offset;
>
> if (hsize <= 0 && i >= nfrags && skb_headlen(list_skb) &&
> - (skb_headlen(list_skb) == len || sg)) {
> + (skb_headlen(list_skb) == len)) {
> BUG_ON(skb_headlen(list_skb) > len);
>
> nskb = skb_clone(list_skb, GFP_ATOMIC);
I must admit I more than a bit lost in the many turns of skb_segment(),
but the above does not look like the correct solution, as it will make
the later BUG_ON() unreachable/meaningless.
Do I read correctly that when the BUG_ON() triggers:
list_skb->len is 125
len is 75
list_skb->frag_head is true
It looks like skb_segment() is becoming even and ever more complex to
cope with unexpected skb layouts, only possibly when skb_segment() is
called by some BPF helpers.
I think it should be up to the helper itself to adjust the skb and/or
the device features to respect skb_segment() constraints, instead of
making the common code increasingly not maintainable.
Thanks,
Paolo
Powered by blists - more mailing lists