netdev - Re: [PATCH bpf-next v5 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAGx-UoABveq2eczTHFyBT8XDT=qLv65iSGEfOwB=R_LbBi5TtA@mail.gmail.com>
Date:   Fri, 1 Feb 2019 08:48:10 -0800
From:   Peter Oskolkov <posk.devel@...il.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Daniel Borkmann <daniel@...earbox.net>,
        Peter Oskolkov <posk@...gle.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Network Development <netdev@...r.kernel.org>,
        David Ahern <dsahern@...il.com>
Subject: Re: [PATCH bpf-next v5 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap

On Thu, Jan 31, 2019 at 5:47 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> On Thu, Jan 31, 2019 at 5:04 PM Daniel Borkmann <daniel@...earbox.net> wrote:
> >
> > On 01/31/2019 12:51 AM, Peter Oskolkov wrote:
> > > This patch implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
> > > BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
> > > and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
> > > to packets (e.g. IP/GRE, GUE, IPIP).
> > >
> > > This is useful when thousands of different short-lived flows should be
> > > encapped, each with different and dynamically determined destination.
> > > Although lwtunnels can be used in some of these scenarios, the ability
> > > to dynamically generate encap headers adds more flexibility, e.g.
> > > when routing depends on the state of the host (reflected in global bpf
> > > maps).
> > >
> > > Signed-off-by: Peter Oskolkov <posk@...gle.com>
>
> > > +int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
> > > +{
> > > +     struct iphdr *iph;
> > > +     bool ipv4;
> > > +     int err;
> > > +
> > > +     if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM))
> > > +             return -EINVAL;
> > > +
> > > +     /* validate protocol and length */
> > > +     iph = (struct iphdr *)hdr;
> > > +     if (iph->version == 4) {
> > > +             ipv4 = true;
> > > +             if (unlikely(len < iph->ihl * 4))
> > > +                     return -EINVAL;
> > > +     } else if (iph->version == 6) {
> > > +             ipv4 = false;
> > > +             if (unlikely(len < sizeof(struct ipv6hdr)))
> > > +                     return -EINVAL;
> > > +     } else {
> > > +             return -EINVAL;
> > > +     }
> > > +
> > > +     if (ingress)
> > > +             err = skb_cow_head(skb, len + skb->mac_len);
> > > +     else
> > > +             err = skb_cow_head(skb,
> > > +                                len + LL_RESERVED_SPACE(skb_dst(skb)->dev));
> > > +     if (unlikely(err))
> > > +             return err;
> > > +
> > > +     /* push the encap headers and fix pointers */
> > > +     skb_reset_inner_headers(skb);
> > > +     skb->encapsulation = 1;
> > > +     skb_push(skb, len);
> > > +     if (ingress)
> > > +             skb_postpush_rcsum(skb, iph, len);
> > > +     skb_reset_network_header(skb);
> > > +     memcpy(skb_network_header(skb), hdr, len);
> > > +     bpf_compute_data_pointers(skb);
> >
> > Does this work transparently with GSO as well or would we need to
> > update shared info for this (like in nat64 case, for example)?
>
> Good point. It does need to update the gso_type to include the tunnel
> type, similar to iptunnel_handle_offloads.
>
> Only, the nice feature of this interface is that it is encap protocol
> independent, which implies that it does not know the correct type.
>
> I don't think that we want to allow programs to write the gso_type themselves.
>
> With GSO_PARTIAL, perhaps specifying the exact tunnel type can be
> avoided as long as it is a fixed prefix to replicate?
>
> The transport layer size does not change, so no need to recompute gso_segs?
>
> Either way, this seems non-trivial enough to me to do in a separate
> follow-on patch. For now just fail if skb_is_gso.

Thanks, Willem, for the details! I'll re-send the patchset with the
additional check that skb_is_gso() is false.