lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Ue3am2MtFcqEpOsLWwEtg_Whh3eV4OoMM_a3ES82s1Scg@mail.gmail.com>
Date:   Tue, 20 Mar 2018 11:08:07 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Yonghong Song <yhs@...com>
Cc:     Eric Dumazet <edumazet@...gle.com>, ast@...com,
        Daniel Borkmann <daniel@...earbox.net>, diptanu@...com,
        Netdev <netdev@...r.kernel.org>, Kernel Team <kernel-team@...com>
Subject: Re: [PATCH net-next v2 1/2] net: permit skb_segment on head_frag
 frag_list skb

On Tue, Mar 20, 2018 at 8:55 AM, Yonghong Song <yhs@...com> wrote:
> One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
> function skb_segment(), line 3667. The bpf program attaches to
> clsact ingress, calls bpf_skb_change_proto to change protocol
> from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
> to send the changed packet out.
>
> 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
> 3473                             netdev_features_t features)
> 3474 {
> 3475         struct sk_buff *segs = NULL;
> 3476         struct sk_buff *tail = NULL;
> ...
> 3665                 while (pos < offset + len) {
> 3666                         if (i >= nfrags) {
> 3667                                 BUG_ON(skb_headlen(list_skb));
> 3668
> 3669                                 i = 0;
> 3670                                 nfrags = skb_shinfo(list_skb)->nr_frags;
> 3671                                 frag = skb_shinfo(list_skb)->frags;
> 3672                                 frag_skb = list_skb;
> ...
>
> call stack:
> ...
>  #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
>  #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
>  #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
>  #4 [ffff883ffef03668] die at ffffffff8101deb2
>  #5 [ffff883ffef03698] do_trap at ffffffff8101a700
>  #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
>  #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
>  #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
>     [exception RIP: skb_segment+3044]
>     RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
>     RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
>     RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
>     RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
>     R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
>     R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>  #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
> --- <IRQ stack> ---
> ...
>
> The triggering input skb has the following properties:
>     list_skb = skb->frag_list;
>     skb->nfrags != NULL && skb_headlen(list_skb) != 0
> and skb_segment() is not able to handle a frag_list skb
> if its headlen (list_skb->len - list_skb->data_len) is not 0.
>
> This patch addressed the issue by handling skb_headlen(list_skb) != 0
> case properly if list_skb->head_frag is true, which is expected in
> most cases. A one-element frag array is created for the list_skb head
> and processed before list_skb->frags are processed.
>
> Reported-by: Diptanu Gon Choudhury <diptanu@...com>
> Signed-off-by: Yonghong Song <yhs@...com>
> ---
>  net/core/skbuff.c | 42 +++++++++++++++++++++++++++++-------------
>  1 file changed, 29 insertions(+), 13 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 715c134..0ad4cda 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3475,9 +3475,10 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>         struct sk_buff *segs = NULL;
>         struct sk_buff *tail = NULL;
>         struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list;
> -       skb_frag_t *frag = skb_shinfo(head_skb)->frags;
> +       skb_frag_t *frag = skb_shinfo(head_skb)->frags, head_frag;

I would move head_frag down into the while loop. No point in making it
global to this function and eating up the extra stack space if you
don't need it.

>         unsigned int mss = skb_shinfo(head_skb)->gso_size;
>         unsigned int doffset = head_skb->data - skb_mac_header(head_skb);
> +       struct sk_buff *check_list_skb = list_skb;

This seems like a waste of a pointer. You can probably just repurpose
nfrags to achieve the same purpose you are achieving below. Note that
nfrags is a signed int and only needing to store up to only 18 or so
total frags so we can probably just reserve a nfrags value of -1 to
indicate that we are using the header frag.

>         struct sk_buff *frag_skb = head_skb;
>         unsigned int offset = doffset;
>         unsigned int tnl_hlen = skb_tnl_header_len(head_skb);
> @@ -3590,6 +3591,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>
>                         nskb = skb_clone(list_skb, GFP_ATOMIC);
>                         list_skb = list_skb->next;
> +                       check_list_skb = list_skb;
>
>                         if (unlikely(!nskb))
>                                 goto err;

If my understanding is correct then this is unneeded if you just
change how you use i and nfrags.

> @@ -3664,21 +3666,35 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>
>                 while (pos < offset + len) {

You could probably just declare here since this would be more local to
where it is actually used and the fact is it can be overwritten with
each iteration of the loop anyway so there is no need to reserve the
space before you get here.

>                         if (i >= nfrags) {
> -                               BUG_ON(skb_headlen(list_skb));
> -
> -                               i = 0;
> -                               nfrags = skb_shinfo(list_skb)->nr_frags;
> -                               frag = skb_shinfo(list_skb)->frags;
> -                               frag_skb = list_skb;
> +                               if (skb_headlen(list_skb) && check_list_skb == list_skb) {
> +                                       struct page *page;
> +
> +                                       BUG_ON(!list_skb->head_frag);
> +
> +                                       i = 0;
> +                                       nfrags = 1;
> +                                       page = virt_to_head_page(list_skb->head);
> +                                       head_frag.page.p = page;
> +                                       head_frag.page_offset = list_skb->data -
> +                                               (unsigned char *)page_address(page);
> +                                       head_frag.size = skb_headlen(list_skb);
> +                                       frag = &head_frag;
> +                                       check_list_skb = list_skb->next;

The whole need for check_list_skb can be worked around if we take
advantage of the fact that i and nfrags are both integers so we could
use -1 for both to indicate we are processing the head frag. The only
bit that gets messy is the fact that we have to add special handling
for the case where skb_shinfo(list_skb)->nr_frags is 0. If that is the
case we would have to set nfrags to 0 and bump the list_skb =
list_skb->next so that we avoid trying to pull frags from an otherwise
empty buffer.

> +                               } else {
> +                                       i = 0;
> +                                       nfrags = skb_shinfo(list_skb)->nr_frags;
> +                                       frag = skb_shinfo(list_skb)->frags;
> +                                       frag_skb = list_skb;
>
> -                               BUG_ON(!nfrags);
> +                                       BUG_ON(!nfrags);
>
> -                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> -                                   skb_zerocopy_clone(nskb, frag_skb,
> -                                                      GFP_ATOMIC))
> -                                       goto err;
> +                                       if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
> +                                           skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
> +                                               goto err;
>
> -                               list_skb = list_skb->next;
> +                                       list_skb = list_skb->next;
> +                                       check_list_skb = list_skb;
> +                               }
>                         }
>
>                         if (unlikely(skb_shinfo(nskb)->nr_frags >=
> --
> 2.9.5
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ