netdev - Re: [PATCH bpf-next 3/7] bpf: Add socket assign support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200317062623.y5v2hejgtdbvexnz@kafai-mbp>
Date:   Mon, 16 Mar 2020 23:26:23 -0700
From:   Martin KaFai Lau <kafai@...com>
To:     Joe Stringer <joe@...d.net.nz>
CC:     <bpf@...r.kernel.org>, netdev <netdev@...r.kernel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Lorenz Bauer <lmb@...udflare.com>
Subject: Re: [PATCH bpf-next 3/7] bpf: Add socket assign support

On Mon, Mar 16, 2020 at 08:06:38PM -0700, Joe Stringer wrote:
> On Mon, Mar 16, 2020 at 3:58 PM Martin KaFai Lau <kafai@...com> wrote:
> >
> > On Thu, Mar 12, 2020 at 04:36:44PM -0700, Joe Stringer wrote:
> > > Add support for TPROXY via a new bpf helper, bpf_sk_assign().
> > >
> > > This helper requires the BPF program to discover the socket via a call
> > > to bpf_sk*_lookup_*(), then pass this socket to the new helper. The
> > > helper takes its own reference to the socket in addition to any existing
> > > reference that may or may not currently be obtained for the duration of
> > > BPF processing. For the destination socket to receive the traffic, the
> > > traffic must be routed towards that socket via local route, the socket
> > I also missed where is the local route check in the patch.
> > Is it implied by a sk can be found in bpf_sk*_lookup_*()?
> 
> This is a requirement for traffic redirection, it's not enforced by
> the patch. If the operator does not configure routing for the relevant
> traffic to ensure that the traffic is delivered locally, then after
> the eBPF program terminates, it will pass up through ip_rcv() and
> friends and be subject to the whims of the routing table. (or
> alternatively if the BPF program redirects somewhere else then this
> reference will be dropped).
> 
> Maybe there's a path to simplifying this configuration path in future
> to loosen this requirement, but for now I've kept the series as
> minimal as possible on that front.
> 
> > [ ... ]
> >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index cd0a532db4e7..bae0874289d8 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -5846,6 +5846,32 @@ static const struct bpf_func_proto bpf_tcp_gen_syncookie_proto = {
> > >       .arg5_type      = ARG_CONST_SIZE,
> > >  };
> > >
> > > +BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
> > > +{
> > > +     if (flags != 0)
> > > +             return -EINVAL;
> > > +     if (!skb_at_tc_ingress(skb))
> > > +             return -EOPNOTSUPP;
> > > +     if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
> > > +             return -ENOENT;
> > > +
> > > +     skb_orphan(skb);
> > > +     skb->sk = sk;
> > sk is from the bpf_sk*_lookup_*() which does not consider
> > the bpf_prog installed in SO_ATTACH_REUSEPORT_EBPF.
> > However, the use-case is currently limited to sk inspection.
> >
> > It now supports selecting a particular sk to receive traffic.
> > Any plan in supporting that?
> 
> I think this is a general bpf_sk*_lookup_*() question, previous
> discussion[0] settled on avoiding that complexity before a use case
> arises, for both TC and XDP versions of these helpers; I still don't
> have a specific use case in mind for such functionality. If we were to
> do it, I would presume that the socket lookup caller would need to
> pass a dedicated flag (supported at TC and likely not at XDP) to
> communicate that SO_ATTACH_REUSEPORT_EBPF progs should be respected
> and used to select the reuseport socket.
It is more about the expectation on the existing SO_ATTACH_REUSEPORT_EBPF
usecase.  It has been fine because SO_ATTACH_REUSEPORT_EBPF's bpf prog
will still be run later (e.g. from tcp_v4_rcv) to decide which sk to
recieve the skb.

If the bpf@tc assigns a TCP_LISTEN sk in bpf_sk_assign(),
will the SO_ATTACH_REUSEPORT_EBPF's bpf still be run later
to make the final sk decision?

> 
> > > diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> > > index 7b089d0ac8cd..f7b42adca9d0 100644
> > > --- a/net/ipv6/ip6_input.c
> > > +++ b/net/ipv6/ip6_input.c
> > > @@ -285,7 +285,10 @@ static struct sk_buff *ip6_rcv_core(struct sk_buff *skb, struct net_device *dev,
> > >       rcu_read_unlock();
> > >
> > >       /* Must drop socket now because of tproxy. */
> > > -     skb_orphan(skb);
> > > +     if (skb_dst_is_sk_prefetch(skb))
> > > +             dst_sk_prefetch_fetch(skb);
> > > +     else
> > > +             skb_orphan(skb);
> > If I understand it correctly, this new test is to skip
> > the skb_orphan() call for locally routed skb.
> > Others cases (forward?) still depend on skb_orphan() to be called here?
> 
> Roughly yes. 'locally routed skb' is a bit loose wording though, at
> this point the BPF program only prefetched the socket to let the stack
> know that it should deliver the skb to that socket, assuming that it
> passes the upcoming routing check.
Which upcoming routing check?  I think it is the part I am missing.

In patch 4, let say the dst_check() returns NULL (may be due to a route
change).  Later in the upper stack, it does a route lookup
(ip_route_input_noref() or ip6_route_input()).  Could it return
a forward route? and I assume missing a skb_orphan() call
here will still be fine?

> 
> For more discussion on the other cases, there is the previous
> thread[1] and in particular the child thread discussion with Florian,
> Eric and Daniel.
> 
> [0] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_netdev-40vger.kernel.org_msg253250.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=mX45GxyUJ_HfsBIJTVMZY9ztD5rVViDuOIQ0pXtyJcM&s=z5lZSVTonmhT5OeyxsefzUC2fMqDEwFvlEV1qkyrULg&e= 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg580058.html&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=mX45GxyUJ_HfsBIJTVMZY9ztD5rVViDuOIQ0pXtyJcM&s=oFYt8cTKQEc-wEfY5YSsjfVN3QqBlFGfrrT7DTKw1rc&e=