[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9db75c0e-6eef-c0b4-b307-f1f7005c39ba@fb.com>
Date: Fri, 16 Mar 2018 21:44:38 -0700
From: Yonghong Song <yhs@...com>
To: Eric Dumazet <eric.dumazet@...il.com>,
Alexei Starovoitov <ast@...com>,
Steffen Klassert <steffen.klassert@...unet.com>,
Daniel Borkmann <daniel@...earbox.net>
CC: netdev <netdev@...r.kernel.org>, Martin Lau <kafai@...com>,
<saeedm@...lanox.com>, <diptanu@...com>
Subject: Re: BUG_ON triggered in skb_segment
On 3/16/18 4:03 PM, Eric Dumazet wrote:
>
>
> On 03/16/2018 03:37 PM, Yonghong Song wrote:
>>
>> Eric and Daniel,
>>
>> I have tried to fix this issue but not really successful.
>> I tried two hacks:
>> . if skb_headlen(list_skb) is not 0, we just pull
>> skb_headlen(list_skb) from the skb to make skb_headlen(list_skb) = 0, or
>> . if skb_headlen(list_skb) is not 0, we go to the beginning of
>> the outer loop which will allocate another nskb for this list_skb.
>>
>> Both approaches removed the BUG and the packet is able to reach the
>> remote host. Upon receiving the packet, however, the remote host sends a
>> reset packet back so connection eventually closed. I did not debug
>> further on this.
>>
>> Considering it is tricky to change skb_segment, I hacked test_bpf
>> kernel module to reproduce the issue. The change reflects the gso packet
>> structure I got from mlx5. Maybe you could take a look and suggest a fix or a direction of how to move forward.
>>
>> Thanks!
>>
>> ============= PATCH ===============
>>
>> -bash-4.2$ git show
>> commit 41681ab51f85b4a0ea3416a0a62d6bde74f3af4b
>> Author: Yonghong Song <yhs@...com>
>> Date: Fri Mar 16 15:10:02 2018 -0700
>>
>> [hack] hack test_bpf module to trigger BUG_ON in skb_segment.
>>
>> "modprobe test_bpf" will have the following errors:
>> ...
>> [ 98.149165] ------------[ cut here ]------------
>> [ 98.159362] kernel BUG at net/core/skbuff.c:3667!
>> [ 98.169756] invalid opcode: 0000 [#1] SMP PTI
>> [ 98.179370] Modules linked in:
>> [ 98.179371] test_bpf(+)
>> ...
>>
>> The BUG happens in function skb_segment:
>> ...
>> 3665 while (pos < offset + len) {
>> 3666 if (i >= nfrags) {
>> 3667 BUG_ON(skb_headlen(list_skb));
>
> Note that you do not need to pull data.
>
> Chances are high that skb->head is a page fragment, so can be considered as such ;)
>
> Look at users of skb_head_is_locked()
Thanks Eric! skb->head_frag... This may well explain why my original
hack won't work.
Yonghong
>
>
>> 3668
>> 3669 i = 0;
>> 3670 nfrags = skb_shinfo(list_skb)->nr_frags;
>> 3671 frag = skb_shinfo(list_skb)->frags;
>> 3672 frag_skb = list_skb;
>> 3673
>> 3674 BUG_ON(!nfrags);
>> ...
>>
>> The skbs are constructed to mimic what mlx5 may generate.
>> The packet size/header may not mimic real cases in production. But
>> the processing flow is similar.
>>
>> Signed-off-by: Yonghong Song <yhs@...com>
>>
>> diff --git a/lib/test_bpf.c b/lib/test_bpf.c
>> index 2efb213..d36a991 100644
>> --- a/lib/test_bpf.c
>> +++ b/lib/test_bpf.c
>> @@ -6574,6 +6574,67 @@ static bool exclude_test(int test_id)
>> return test_id < test_range[0] || test_id > test_range[1];
>> }
>>
>> +static struct sk_buff *build_test_skb(void) {
>> + u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> + u32 headroom = NET_SKB_PAD + NET_IP_ALIGN + ETH_HLEN;
>> + struct sk_buff *skb[2];
>> + void *data[2], *page;
>> + int i, data_size = 8;
>> +
>> + page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
>> + if (!page)
>> + return NULL;
>> +
>> + for (i = 0; i < 2; i++) {
>> + data[i] = kzalloc(headroom + tailroom + data_size, GFP_KERNEL);
>> + if (!data[i])
>> + return NULL;
>> + skb[i] = build_skb(data[i], 0);
>> + if (!skb[i]) {
>> + kfree(data[i]);
>> + return NULL;
>> + }
>> + skb_reserve(skb[i], headroom);
>> + skb_put(skb[i], data_size);
>> + skb[i]->protocol = htons(ETH_P_IP);
>> + skb_reset_network_header(skb[i]);
>> + skb_set_mac_header(skb[i], -ETH_HLEN);
>> +
>> + skb_add_rx_frag(skb[i], skb_shinfo(skb[i])->nr_frags,
>> + page, 0, 64, 64);
>> + }
>> +
>> + /* setup shinfo */
>> + skb_shinfo(skb[0])->gso_size = 1448;
>> + skb_shinfo(skb[0])->gso_type = SKB_GSO_TCPV4;
>> + skb_shinfo(skb[0])->gso_type |= SKB_GSO_DODGY;
>> + skb_shinfo(skb[0])->gso_segs = 0;
>> + skb_shinfo(skb[0])->frag_list = skb[1];
>> +
>> + /* adjust skb[0]'s len */
>> + skb[0]->len += skb[1]->len;
>> + skb[0]->data_len += skb[1]->data_len;
>> + skb[0]->truesize += skb[1]->truesize;
>> +
>> + return skb[0];
>> +}
>> +
>> +static void test_skb_segment(void) {
>> + netdev_features_t features;
>> + struct sk_buff *skb;
>> +
>> + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
>> + features |= NETIF_F_RXCSUM;
>> + skb = build_test_skb();
>> + if (!skb)
>> + pr_info("Failed in test_skb_segment:build_test_skb!");
>> +
>> + if (skb_segment(skb, features))
>> + pr_info("Success in test_skb_segment!");
>> + else
>> + pr_info("Failed in test_skb_segment!");
>> +}
>> +
>> static __init int test_bpf(void)
>> {
>> int i, err_cnt = 0, pass_cnt = 0;
>> @@ -6631,7 +6692,8 @@ static int __init test_bpf_init(void)
>> if (ret < 0)
>> return ret;
>>
>> - ret = test_bpf();
>> + // ret = test_bpf();
>> + test_skb_segment();
>>
>> destroy_bpf_tests();
>> return ret;
>>
>>
>> ============= END ================
>>
>>
>> On 3/13/18 6:15 PM, Eric Dumazet wrote:
>>>
>>>
>>> On 03/13/2018 05:35 PM, Eric Dumazet wrote:
>>>>
>>>>
>>>> On 03/13/2018 05:26 PM, Eric Dumazet wrote:
>>>>>
>>>>>
>>>>> On 03/13/2018 05:04 PM, Alexei Starovoitov wrote:
>>>>>> On 3/13/18 4:27 PM, Eric Dumazet wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 03/13/2018 04:09 PM, Alexei Starovoitov wrote:
>>>>>>>
>>>>>>>> we have bpf_skb_proto_6_to_4() that was used by cilium for long time.
>>>>>>>> It's not clear why it's not crashing there, but we cannot just
>>>>>>>> reject changing proto in bpf programs now.
>>>>>>>> We have to fix whatever needs to be fixed in skb_segment
>>>>>>>> (if bug is there) or fix whatever necessary on mlx5 side.
>>>>>>>> In bpf helper we mark it as SKB_GSO_DODGY just like packets coming
>>>>>>>> through virtio would do, so if skb_segment() needs to do something
>>>>>>>> special with skb the SKB_GSO_DODGY flag is already there.
>>>>>>>
>>>>>>> 'Fixing' skb_segment(), I did that a long time ago and Herbert Xu was
>>>>>>> not happy with the fix and provided something else.
>>>>>>
>>>>>> any link to your old patches and discussion?
>>>>>>
>>>>>> I think since mlx4 can do tso on them and the packets came out
>>>>>> correct on the wire, there is nothing fundamentally wrong with
>>>>>> changing gso_size. Just tricky to teach skb_segment.
>>>>>>
>>>>>
>>>>> The world is not mlx4 only. Some NIC will ask skb_segment() fallback segmentation for various reasons (like skb->len above a given limit like 16KB)
>>>>>
>>>>> Maybe https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg255549.html&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=x9lG2k53B7AFsnNx6t2DgbXt06J-sLznZIHk6YdGoGg&s=0Nxx2G8PtDAEMCFAvQ7kxYTXVr9aHdOolP1KB_lnmes&e=
>>>>
>>>>
>>>> Herbert patch :
>>>>
>>>> commit 9d8506cc2d7ea1f911c72c100193a3677f6668c3
>>>> Author: Herbert Xu <herbert@...dor.apana.org.au>
>>>> Date: Thu Nov 21 11:10:04 2013 -0800
>>>>
>>>> gso: handle new frag_list of frags GRO packets
>>>>
>>>
>>> I found my initial patch.
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg255452.html&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=x9lG2k53B7AFsnNx6t2DgbXt06J-sLznZIHk6YdGoGg&s=VuWRpUdJwBwTxpnMNZYgKvQANLL5UA7hZnTFZsQlK6c&e=
>>>
>>>
Powered by blists - more mailing lists