[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c78c0c21-e161-f8c3-afc6-04b9888647ee@fb.com>
Date: Wed, 21 Mar 2018 16:02:25 -0700
From: Yonghong Song <yhs@...com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: Eric Dumazet <edumazet@...gle.com>, <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>, <diptanu@...com>,
Netdev <netdev@...r.kernel.org>, Kernel Team <kernel-team@...com>
Subject: Re: [PATCH net-next v5 1/2] net: permit skb_segment on head_frag
frag_list skb
On 3/21/18 2:51 PM, Alexander Duyck wrote:
> On Wed, Mar 21, 2018 at 1:36 PM, Yonghong Song <yhs@...com> wrote:
>> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
>> function skb_segment(), line 3667. The bpf program attaches to
>> clsact ingress, calls bpf_skb_change_proto to change protocol
>> from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
>> to send the changed packet out.
>>
>> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
>> 3473 netdev_features_t features)
>> 3474 {
>> 3475 struct sk_buff *segs = NULL;
>> 3476 struct sk_buff *tail = NULL;
>> ...
>> 3665 while (pos < offset + len) {
>> 3666 if (i >= nfrags) {
>> 3667 BUG_ON(skb_headlen(list_skb));
>> 3668
>> 3669 i = 0;
>> 3670 nfrags = skb_shinfo(list_skb)->nr_frags;
>> 3671 frag = skb_shinfo(list_skb)->frags;
>> 3672 frag_skb = list_skb;
>> ...
>>
>> call stack:
>> ...
>> #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
>> #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
>> #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
>> #4 [ffff883ffef03668] die at ffffffff8101deb2
>> #5 [ffff883ffef03698] do_trap at ffffffff8101a700
>> #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
>> #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
>> #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
>> [exception RIP: skb_segment+3044]
>> RIP: ffffffff817e4dd4 RSP: ffff883ffef03860 RFLAGS: 00010216
>> RAX: 0000000000002bf6 RBX: ffff883feb7aaa00 RCX: 0000000000000011
>> RDX: ffff883fb87910c0 RSI: 0000000000000011 RDI: ffff883feb7ab500
>> RBP: ffff883ffef03928 R8: 0000000000002ce2 R9: 00000000000027da
>> R10: 000001ea00000000 R11: 0000000000002d82 R12: ffff883f90a1ee80
>> R13: ffff883fb8791120 R14: ffff883feb7abc00 R15: 0000000000002ce2
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>> #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
>> --- <IRQ stack> ---
>> ...
>>
>> The triggering input skb has the following properties:
>> list_skb = skb->frag_list;
>> skb->nfrags != NULL && skb_headlen(list_skb) != 0
>> and skb_segment() is not able to handle a frag_list skb
>> if its headlen (list_skb->len - list_skb->data_len) is not 0.
>>
>> This patch addressed the issue by handling skb_headlen(list_skb) != 0
>> case properly if list_skb->head_frag is true, which is expected in
>> most cases. The head frag is processed before list_skb->frags
>> are processed.
>>
>> Reported-by: Diptanu Gon Choudhury <diptanu@...com>
>> Signed-off-by: Yonghong Song <yhs@...com>
>> ---
>> net/core/skbuff.c | 26 ++++++++++++++++++++------
>> 1 file changed, 20 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 715c134..23b317a 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3460,6 +3460,19 @@ void *skb_pull_rcsum(struct sk_buff *skb, unsigned int len)
>> }
>> EXPORT_SYMBOL_GPL(skb_pull_rcsum);
>>
>> +static inline skb_frag_t skb_head_frag_to_page_desc(struct sk_buff *frag_skb)
>> +{
>> + skb_frag_t head_frag;
>> + struct page *page;
>> +
>> + page = virt_to_head_page(frag_skb->head);
>> + head_frag.page.p = page;
>> + head_frag.page_offset = frag_skb->data -
>> + (unsigned char *)page_address(page);
>> + head_frag.size = skb_headlen(frag_skb);
>> + return head_frag;
>> +}
>> +
>> /**
>> * skb_segment - Perform protocol segmentation on skb.
>> * @head_skb: buffer to segment
>> @@ -3664,15 +3677,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>> while (pos < offset + len) {
>> if (i >= nfrags) {
>> - BUG_ON(skb_headlen(list_skb));
>> -
>> i = 0;
>> nfrags = skb_shinfo(list_skb)->nr_frags;
>> frag = skb_shinfo(list_skb)->frags;
>> - frag_skb = list_skb;
>
> You could probably leave this line in place. No point in moving it.
The only reason I moved it is to make define more close to the use.
But I am totally fine with leaving it as it.
>
>> -
>> - BUG_ON(!nfrags);
>> + if (skb_headlen(list_skb)) {
>> + BUG_ON(!list_skb->head_frag);
>>
>> + /* to make room for head_frag. */
>> + i--; frag--;
>
> Normally these should be two separate lines one for "i--;" and one for
> "frag--;".
Will change. Surprised that checkpatch.pl did not complain about this.
>
>> + }
>
> You could probably place the BUG_ON(!nfrags) in an else statement here
> to handle the case where we have a potentially empty skb which would
> be a bug.
Yes, this makes sense. Will add this BUG_ON.
>
>> + frag_skb = list_skb;
>> if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> skb_zerocopy_clone(nskb, frag_skb,
>> GFP_ATOMIC))
>> @@ -3689,7 +3703,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>> goto err;
>> }
>>
>> - *nskb_frag = *frag;
>> + *nskb_frag = (i < 0) ? skb_head_frag_to_page_desc(frag_skb) : *frag;
>> __skb_frag_ref(nskb_frag);
>> size = skb_frag_size(nskb_frag);
>>
>> --
>> 2.9.5
>>
Powered by blists - more mailing lists