netdev - Re: [PATCH net] net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Uf-OvKKycJz7aTZ93J=RdUuwd=SFS9C9pTngieDxe0uYQ@mail.gmail.com>
Date:   Thu, 5 Sep 2019 14:49:44 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Shmulik Ladkani <shmulik@...anetworks.com>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        eyal@...anetworks.com, netdev <netdev@...r.kernel.org>,
        Shmulik Ladkani <shmulik.ladkani@...il.com>
Subject: Re: [PATCH net] net: gso: Fix skb_segment splat when splitting
 gso_size mangled skb having linear-headed frag_list

On Thu, Sep 5, 2019 at 11:36 AM Shmulik Ladkani
<shmulik@...anetworks.com> wrote:
>
> Historically, support for frag_list packets entering skb_segment() was
> limited to frag_list members terminating on exact same gso_size
> boundaries. This is verified with a BUG_ON since commit 89319d3801d1
> ("net: Add frag_list support to skb_segment"), quote:
>
>     As such we require all frag_list members terminate on exact MSS
>     boundaries.  This is checked using BUG_ON.
>     As there should only be one producer in the kernel of such packets,
>     namely GRO, this requirement should not be difficult to maintain.
>
> However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"),
> the "exact MSS boundaries" assumption no longer holds:
> An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
> leaves the frag_list members as originally merged by GRO with the
> original 'gso_size'. Example of such programs are bpf-based NAT46 or
> NAT64.
>
> This lead to a kernel BUG_ON for flows involving:
>  - GRO generating a frag_list skb
>  - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
>  - skb_segment() of the skb
>
> See example BUG_ON reports in [0].
>
> In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"),
> skb_segment() was modified to support the "gso_size mangling" case of
> a frag_list GRO'ed skb, but *only* for frag_list members having
> head_frag==true (having a page-fragment head).
>
> Alas, GRO packets having frag_list members with a linear kmalloced head
> (head_frag==false) still hit the BUG_ON.
>
> This commit adds support to skb_segment() for a 'head_skb' packet having
> a frag_list whose members are *non* head_frag, with gso_size mangled, by
> disabling SG and thus falling-back to copying the data from the given
> 'head_skb' into the generated segmented skbs - as suggested by Willem de
> Bruijn [1].
>
> Since this approach involves the penalty of skb_copy_and_csum_bits()
> when building the segments, care was taken in order to enable this
> solution only when required:
>  - untrusted gso_size, by testing SKB_GSO_DODGY is set
>    (SKB_GSO_DODGY is set by any gso_size mangling functions in
>     net/core/filter.c)
>  - the frag_list is non empty, its item is a non head_frag, *and* the
>    headlen of the given 'head_skb' does not match the gso_size.
>
> [0]
> https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/
> https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/
>
> [1]
> https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/
>
> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@...il.com>
> Cc: Daniel Borkmann <daniel@...earbox.net>
> Cc: Eric Dumazet <eric.dumazet@...il.com>
> Cc: Alexander Duyck <alexander.duyck@...il.com>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@...il.com>
> ---
>  net/core/skbuff.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index ea8e8d332d85..c4bd1881acff 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3678,6 +3678,24 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>         sg = !!(features & NETIF_F_SG);
>         csum = !!can_checksum_protocol(features, proto);
>
> +       if (mss != GSO_BY_FRAGS &&
> +           (skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY)) {
> +               /* gso_size is untrusted.
> +                *
> +                * If head_skb has a frag_list with a linear non head_frag
> +                * item, and head_skb's headlen does not fit requested
> +                * gso_size, fall back to copying the skbs - by disabling sg.
> +                *
> +                * We assume checking the first frag suffices, i.e if either of
> +                * the frags have non head_frag data, then the first frag is
> +                * too.
> +                */
> +               if (list_skb && skb_headlen(list_skb) && !list_skb->head_frag &&
> +                   (mss != skb_headlen(head_skb) - doffset)) {
> +                       sg = false;
> +               }
> +       }
> +

I would change the order of the tests you use here so that we can
eliminate the possibility of needing to perform many tests for the
more common cases. You could probably swap "list_skb" and "mss !=
GSO_BY_FRAGS" since list_skb is more likely to be false for many of
the common cases such as a standard TSO send from a socket. You might
even consider moving the GSO_BY_FRAGS check toward the end of your
checks since SCTP is the only protocol that I believe uses it and the
likelihood of encountering it is much lower compared to other
protocols.

You could probably test for !list_skb->head_frag before seeing if
there is a headlen since many NICs would be generating frames using
head_frag, so in the GRO case you mentioned above it could probably
save you some effort on a number of NICs.

You might also consider moving this code up before we push the mac
header back on and instead of setting sg to false you could just clear
the NETIF_F_SG flag from features. It would save you from having to
then remove doffset in your last check.

>         if (sg && csum && (mss != GSO_BY_FRAGS))  {
>                 if (!(features & NETIF_F_GSO_PARTIAL)) {
>                         struct sk_buff *iter;
> --
> 2.19.1
>