lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cd4bcd39-5f11-757e-8e43-5e494a0cb551@fb.com>
Date:   Tue, 20 Mar 2018 15:53:39 -0700
From:   Yonghong Song <yhs@...com>
To:     Alexander Duyck <alexander.duyck@...il.com>
CC:     Eric Dumazet <edumazet@...gle.com>, <ast@...com>,
        Daniel Borkmann <daniel@...earbox.net>, <diptanu@...com>,
        Netdev <netdev@...r.kernel.org>, Kernel Team <kernel-team@...com>
Subject: Re: [PATCH net-next v2 1/2] net: permit skb_segment on head_frag
 frag_list skb



On 3/20/18 11:08 AM, Alexander Duyck wrote:
> On Tue, Mar 20, 2018 at 8:55 AM, Yonghong Song <yhs@...com> wrote:
>> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
>> function skb_segment(), line 3667. The bpf program attaches to
>> clsact ingress, calls bpf_skb_change_proto to change protocol
>> from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
>> to send the changed packet out.
>>
>> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
>> 3473                             netdev_features_t features)
>> 3474 {
>> 3475         struct sk_buff *segs = NULL;
>> 3476         struct sk_buff *tail = NULL;
>> ...
>> 3665                 while (pos < offset + len) {
>> 3666                         if (i >= nfrags) {
>> 3667                                 BUG_ON(skb_headlen(list_skb));
>> 3668
>> 3669                                 i = 0;
>> 3670                                 nfrags = skb_shinfo(list_skb)->nr_frags;
>> 3671                                 frag = skb_shinfo(list_skb)->frags;
>> 3672                                 frag_skb = list_skb;
>> ...
>>
>> call stack:
>> ...
>>   #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
>>   #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
>>   #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
>>   #4 [ffff883ffef03668] die at ffffffff8101deb2
>>   #5 [ffff883ffef03698] do_trap at ffffffff8101a700
>>   #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
>>   #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
>>   #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
>>      [exception RIP: skb_segment+3044]
>>      RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
>>      RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
>>      RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
>>      RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
>>      R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
>>      R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>   #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
>> --- <IRQ stack> ---
>> ...
>>
>> The triggering input skb has the following properties:
>>      list_skb = skb->frag_list;
>>      skb->nfrags != NULL && skb_headlen(list_skb) != 0
>> and skb_segment() is not able to handle a frag_list skb
>> if its headlen (list_skb->len - list_skb->data_len) is not 0.
>>
>> This patch addressed the issue by handling skb_headlen(list_skb) != 0
>> case properly if list_skb->head_frag is true, which is expected in
>> most cases. A one-element frag array is created for the list_skb head
>> and processed before list_skb->frags are processed.
>>
>> Reported-by: Diptanu Gon Choudhury <diptanu@...com>
>> Signed-off-by: Yonghong Song <yhs@...com>
>> ---
>>   net/core/skbuff.c | 42 +++++++++++++++++++++++++++++-------------
>>   1 file changed, 29 insertions(+), 13 deletions(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 715c134..0ad4cda 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3475,9 +3475,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>          struct sk_buff *segs = NULL;
>>          struct sk_buff *tail = NULL;
>>          struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
>> -       skb_frag_t *frag = skb_shinfo(head_skb)->frags;
>> +       skb_frag_t *frag = skb_shinfo(head_skb)->frags, head_frag;
> 
> I would move head_frag down into the while loop. No point in making it
> global to this function and eating up the extra stack space if you
> don't need it.

We need head_frag declared outside the main segmentation loop.
Its value may survive across different main loop iterations.
However, I see your point to save the stack space.
Will use a pointer here and do allocation on demand instead.

> 
>>          unsigned int mss = skb_shinfo(head_skb)->gso_size;
>>          unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
>> +       struct sk_buff *check_list_skb = list_skb;
> 
> This seems like a waste of a pointer. You can probably just repurpose
> nfrags to achieve the same purpose you are achieving below. Note that
> nfrags is a signed int and only needing to store up to only 18 or so
> total frags so we can probably just reserve a nfrags value of -1 to
> indicate that we are using the header frag.

Just did some prototyping. Indeed, we could reserve the i = -1 to
indicate the special head_frag.

> 
>>          struct sk_buff *frag_skb = head_skb;
>>          unsigned int offset = doffset;
>>          unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
>> @@ -3590,6 +3591,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                          nskb = skb_clone(list_skb, GFP_ATOMIC);
>>                          list_skb = list_skb->next;
>> +                       check_list_skb = list_skb;
>>
>>                          if (unlikely(!nskb))
>>                                  goto err;
> 
> If my understanding is correct then this is unneeded if you just
> change how you use i and nfrags.

Right.

> 
>> @@ -3664,21 +3666,35 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>
>>                  while (pos < offset + len) {
> 
> You could probably just declare here since this would be more local to
> where it is actually used and the fact is it can be overwritten with
> each iteration of the loop anyway so there is no need to reserve the
> space before you get here.

The actual space can be allocated once and reused later. But the space
needs to be preserved across main segmentation loop.

> 
>>                          if (i >= nfrags) {
>> -                               BUG_ON(skb_headlen(list_skb));
>> -
>> -                               i = 0;
>> -                               nfrags = skb_shinfo(list_skb)->nr_frags;
>> -                               frag = skb_shinfo(list_skb)->frags;
>> -                               frag_skb = list_skb;
>> +                               if (skb_headlen(list_skb) && check_list_skb == list_skb) {
>> +                                       struct page *page;
>> +
>> +                                       BUG_ON(!list_skb->head_frag);
>> +
>> +                                       i = 0;
>> +                                       nfrags = 1;
>> +                                       page = virt_to_head_page(list_skb->head);
>> +                                       head_frag.page.p = page;
>> +                                       head_frag.page_offset = list_skb->data -
>> +                                               (unsigned char *)page_address(page);
>> +                                       head_frag.size = skb_headlen(list_skb);
>> +                                       frag = &head_frag;
>> +                                       check_list_skb = list_skb->next;
> 
> The whole need for check_list_skb can be worked around if we take
> advantage of the fact that i and nfrags are both integers so we could
> use -1 for both to indicate we are processing the head frag. The only
> bit that gets messy is the fact that we have to add special handling
> for the case where skb_shinfo(list_skb)->nr_frags is 0. If that is the
> case we would have to set nfrags to 0 and bump the list_skb =
> list_skb->next so that we avoid trying to pull frags from an otherwise
> empty buffer.

Right. if nr_frags = 0, we do need to avoid pulling frags.

Thanks for the review, will send out v3 shortly.

> 
>> +                               } else {
>> +                                       i = 0;
>> +                                       nfrags = skb_shinfo(list_skb)->nr_frags;
>> +                                       frag = skb_shinfo(list_skb)->frags;
>> +                                       frag_skb = list_skb;
>>
>> -                               BUG_ON(!nfrags);
>> +                                       BUG_ON(!nfrags);
>>
>> -                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> -                                   skb_zerocopy_clone(nskb, frag_skb,
>> -                                                      GFP_ATOMIC))
>> -                                       goto err;
>> +                                       if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>> +                                           skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>> +                                               goto err;
>>
>> -                               list_skb = list_skb->next;
>> +                                       list_skb = list_skb->next;
>> +                                       check_list_skb = list_skb;
>> +                               }
>>                          }
>>
>>                          if (unlikely(skb_shinfo(nskb)->nr_frags >=
>> --
>> 2.9.5
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ