lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4b0101e6-f533-e14b-62f7-ae5531068a05@fb.com>
Date:   Wed, 21 Mar 2018 13:10:29 -0700
From:   Yonghong Song <yhs@...com>
To:     Alexander Duyck <alexander.duyck@...il.com>
CC:     Eric Dumazet <edumazet@...gle.com>, <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>, <diptanu@...com>,
        Netdev <netdev@...r.kernel.org>, Kernel Team <kernel-team@...com>
Subject: Re: [PATCH net-next v3 1/2] net: permit skb_segment on head_frag
 frag_list skb



On 3/21/18 7:59 AM, Alexander Duyck wrote:
> On Tue, Mar 20, 2018 at 10:02 PM, Yonghong Song <yhs@...com> wrote:
>>
>>
>> On 3/20/18 4:50 PM, Alexander Duyck wrote:
>>>
>>> On Tue, Mar 20, 2018 at 4:21 PM, Yonghong Song <yhs@...com> wrote:
>>>>
>>>> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
>>>> function skb_segment(), line 3667. The bpf program attaches to
>>>> clsact ingress, calls bpf_skb_change_proto to change protocol
>>>> from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
>>>> to send the changed packet out.
>>>>
>>>> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>> 3473                             netdev_features_t features)
>>>> 3474 {
>>>> 3475         struct sk_buff *segs = NULL;
>>>> 3476         struct sk_buff *tail = NULL;
>>>> ...
>>>> 3665                 while (pos < offset + len) {
>>>> 3666                         if (i >= nfrags) {
>>>> 3667                                 BUG_ON(skb_headlen(list_skb));
>>>> 3668
>>>> 3669                                 i = 0;
>>>> 3670                                 nfrags =
>>>> skb_shinfo(list_skb)->nr_frags;
>>>> 3671                                 frag = skb_shinfo(list_skb)->frags;
>>>> 3672                                 frag_skb = list_skb;
>>>> ...
>>>>
>>>> call stack:
>>>> ...
>>>>    #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
>>>>    #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
>>>>    #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
>>>>    #4 [ffff883ffef03668] die at ffffffff8101deb2
>>>>    #5 [ffff883ffef03698] do_trap at ffffffff8101a700
>>>>    #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
>>>>    #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
>>>>    #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
>>>>       [exception RIP: skb_segment+3044]
>>>>       RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
>>>>       RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
>>>>       RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
>>>>       RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
>>>>       R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
>>>>       R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
>>>>       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>>    #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
>>>> --- <IRQ stack> ---
>>>> ...
>>>>
>>>> The triggering input skb has the following properties:
>>>>       list_skb = skb->frag_list;
>>>>       skb->nfrags != NULL && skb_headlen(list_skb) != 0
>>>> and skb_segment() is not able to handle a frag_list skb
>>>> if its headlen (list_skb->len - list_skb->data_len) is not 0.
>>>>
>>>> This patch addressed the issue by handling skb_headlen(list_skb) != 0
>>>> case properly if list_skb->head_frag is true, which is expected in
>>>> most cases. The head frag is processed before list_skb->frags
>>>> are processed.
>>>>
>>>> Reported-by: Diptanu Gon Choudhury <diptanu@...com>
>>>> Signed-off-by: Yonghong Song <yhs@...com>
>>>> ---
>>>>    net/core/skbuff.c | 51
>>>> +++++++++++++++++++++++++++++++++++++--------------
>>>>    1 file changed, 37 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index 715c134..59bbc06 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -3475,7 +3475,7 @@ struct sk_buff *skb_segment(struct sk_buff
>>>> *head_skb,
>>>>           struct sk_buff *segs = NULL;
>>>>           struct sk_buff *tail = NULL;
>>>>           struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
>>>> -       skb_frag_t *frag = skb_shinfo(head_skb)->frags;
>>>> +       skb_frag_t *frag = skb_shinfo(head_skb)->frags, *head_frag =
>>>> NULL;
>>>
>>>
>>> I think you misunderstood me. I wasn't saying you allocate head_frag.
>>> I was saying you could move the declaration down.
>>
>>
>> Sorry for my misunderstanding. I did understand your intention of moving
>> the declaration down in order to save stack space. I thought that we cannot
>> really move declaration down (although it works in C, but semantically it is
>> not quite right, more later), so I moved on to
>> use runtime allocation. But indeed skb_frag_t is not big (16 bytes), it
>> could live on the stack.
>>
>>>
>>>>           unsigned int mss = skb_shinfo(head_skb)->gso_size;
>>>>           unsigned int doffset = head_skb->data -
>>>> skb_mac_header(head_skb);
>>>>           struct sk_buff *frag_skb = head_skb;
>>>> @@ -3664,19 +3664,39 @@ struct sk_buff *skb_segment(struct sk_buff
>>>> *head_skb,
>>>>
>>>>                   while (pos < offset + len) {
>>>
>>>
>>> So right here in the loop you could add a "skb_frag_t head_frag;" just
>>> so we declare it here and save ourselves the stack space.
>>
>>
>> I actually tried to move "skb_frag_t head_frag". The stack size remains the
>> same, 0xc0. This is related to how C compiler allocates stack space.
>> The declaration place won't decide the stack size as long as the declaration
>> dictates the usage. The stack size is really determined by liveness
>> analysis.
>>
>> Further, we have code like:
>>          do {
>>             ....
>>             while (pos < offset + len) {
>>                 if (i >= nfrags) {
>>                     ...
>>                     head_frag = ...
>>                 }
>>                 ... = head_frag; // head_frag access guaranteed after
>>                                  // above definition, but it may not
>>                                  // be in the same outer do-while loop.
>>             }
>>             ...
>>           } while (((offset += len) < head_skb->len);
>>
>> So the use of head_frag maybe in different outer loop iterations.
>> So I feel the definition of head_frag should be outside the
>> outer do-while loop, which is the main function scope. I will add some
>> comments here.
> 
> So the point I had is that head_frag doesn't need to live that long.
> All you are doing is arranging the data so that you can essentially
> just dump it into *nskb_frag.
> 
> One alternative you could look at would be to rearrange the code so
> that the MAX_SKB_FRAGS check occurs first, then perform the check for
> (i >= nfrags). Doing it that way we end up bypassing the need for
> head_frag entirely as you could just populate *nskb_frag directly. You
> could probably just add an inline structure that would convert the
> head frag into a skb_frag_t structure and return that. Something along
> the lines of:
>          *nskb_frag = (i < 0) ? skb_head_frag_to_page_desc(frag_skb) : *frag;

This mechanism (inline function skb_head_frag_to_page_desc) works great!
I removed the local variable and the total stack size remains unchanged
with the patch.

> 
> Doing it that way you could pull out the bits below that were
> populating head frag and just have it all handled as an inline
> function. An added advantage is that it should make the line wrapping
> easier to deal with. :-)
> 
>>>
>>>>                           if (i >= nfrags) {
>>>> -                               BUG_ON(skb_headlen(list_skb));
>>>> -
>>>>                                   i = 0;
>>>> +                               if (skb_headlen(list_skb)) {
>>>> +                                       struct page *page;
>>>> +
>>>> +                                       BUG_ON(!list_skb->head_frag);
>>>> +
>>>> +                                       page =
>>>> virt_to_head_page(list_skb->head);
>>>> +                                       if (!head_frag) {
>>>> +                                               head_frag =
>>>> kmalloc(sizeof(skb_frag_t),
>>>> +
>>>> GFP_KERNEL);
>>>> +                                               if (!head_frag)
>>>> +                                                       goto err;
>>>> +                                       }
>>>
>>>
>>> Please no memory allocation. I just meant you could allocate it on the
>>> stack later.
>>>
>>>> +                                       head_frag->page.p = page;
>>>> +                                       head_frag->page_offset =
>>>> list_skb->data -
>>>> +                                               (unsigned char
>>>> *)page_address(page);
>>>> +                                       head_frag->size =
>>>> skb_headlen(list_skb);
>>>> +                                       /* set i = -1 so we will pick
>>>> head_frag
>>>> +                                        * instead of
>>>> skb_shinfo(list_skb)->frags
>>>> +                                        * when i == -1.
>>>> +                                        */
>>>> +                                       i = -1;
>>>> +                               }
>>>
>>>
>>> So it took me a bit to pick up on the fact that line below wasn't
>>> removed. So we are basically trying to do this all in one pass now. Do
>>> I have that right?
>>>
>>> One thing you could look at doing to save yourself the extra "if"
>>> later would be to pull frag pointer before you go through skb_headlen
>>> check above. Then if you are going to use a head_frag you could just
>>> do a "i--; frag--;" combination just to rewind and make the room for
>>> the increment to come later. That way you don't have an invalid frag
>>> pointer floating around. That way you only have to do this once
>>> instead of having to do a conditional check per fragment.
>>
>>
>> Right. This indeed make code more cleaner.
>>
>>>
>>>>                                   nfrags = skb_shinfo(list_skb)->nr_frags;
>>>> -                               frag = skb_shinfo(list_skb)->frags;
>>>
>>>
>>> This patch might be more readable if you were to just insert the
>>> skb_headlen() bits down here and left the i=0 through frag = .. in one
>>> piece.
>>
>>
>> Right. Will implement as suggested.
>>
>>>
>>>> -                               frag_skb = list_skb;
>>>> -
>>>> -                               BUG_ON(!nfrags);
>>>> -
>>>> -                               if (skb_orphan_frags(frag_skb,
>>>> GFP_ATOMIC) ||
>>>> -                                   skb_zerocopy_clone(nskb, frag_skb,
>>>> -                                                      GFP_ATOMIC))
>>>> -                                       goto err;
>>>> +                               if (nfrags) {
>>>> +                                       frag =
>>>> skb_shinfo(list_skb)->frags;
>>>> +                                       frag_skb = list_skb;
>>>> +
>>>> +                                       if (skb_orphan_frags(frag_skb,
>>>> GFP_ATOMIC) ||
>>>> +                                           skb_zerocopy_clone(nskb,
>>>> frag_skb,
>>>> +
>>>> GFP_ATOMIC))
>>>> +                                               goto err;
>>>> +                               }
>>>>
>>>>                                   list_skb = list_skb->next;
>>>>                           }
>>>> @@ -3689,7 +3709,7 @@ struct sk_buff *skb_segment(struct sk_buff
>>>> *head_skb,
>>>>                                   goto err;
>>>>                           }
>>>>
>>>> -                       *nskb_frag = *frag;
>>>> +                       *nskb_frag = (i == -1) ? *head_frag : *frag;
>>>
>>>
>>> So this would be better as "*nskb_frag = (i < 0) ? head_frag : *frag;".
>>
>>
>> Good suggestion. Will implement as suggested.
>>
>>
>>>
>>>>                           __skb_frag_ref(nskb_frag);
>>>>                           size = skb_frag_size(nskb_frag);
>>>>
>>>> @@ -3702,7 +3722,8 @@ struct sk_buff *skb_segment(struct sk_buff
>>>> *head_skb,
>>>>
>>>>                           if (pos + size <= offset + len) {
>>>>                                   i++;
>>>> -                               frag++;
>>>> +                               if (i != 0)
>>>> +                                       frag++;
>>>>                                   pos += size;
>>>>                           } else {
>>>>                                   skb_frag_size_sub(nskb_frag, pos + size
>>>> - (offset + len));
>>>> @@ -3774,10 +3795,12 @@ struct sk_buff *skb_segment(struct sk_buff
>>>> *head_skb,
>>>>                   swap(tail->destructor, head_skb->destructor);
>>>>                   swap(tail->sk, head_skb->sk);
>>>>           }
>>>> +       kfree(head_frag);
>>>>           return segs;
>>>>
>>>>    err:
>>>>           kfree_skb_list(segs);
>>>> +       kfree(head_frag);
>>>>           return ERR_PTR(err);
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(skb_segment);
>>>> --
>>>> 2.9.5
>>>>
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ