[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180515164801.lp5oyabhnukagczc@kafai-mbp.dhcp.thefacebook.com>
Date: Tue, 15 May 2018 09:48:03 -0700
From: Martin KaFai Lau <kafai@...com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Joe Stringer <joe@...d.net.nz>
CC: <daniel@...earbox.net>, netdev <netdev@...r.kernel.org>,
<ast@...nel.org>, john fastabend <john.fastabend@...il.com>
Subject: Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
On Mon, May 14, 2018 at 08:16:59PM -0700, Alexei Starovoitov wrote:
> On Fri, May 11, 2018 at 05:54:33PM -0700, Joe Stringer wrote:
> > On 11 May 2018 at 14:41, Martin KaFai Lau <kafai@...com> wrote:
> > > On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote:
> > >> On 10 May 2018 at 22:00, Martin KaFai Lau <kafai@...com> wrote:
> > >> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
> > >> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF
> > >> >> programs to find out if there is a socket listening on this host, and
> > >> >> returns a socket pointer which the BPF program can then access to
> > >> >> determine, for instance, whether to forward or drop traffic. sk_lookup()
> > >> >> takes a reference on the socket, so when a BPF program makes use of this
> > >> >> function, it must subsequently pass the returned pointer into the newly
> > >> >> added sk_release() to return the reference.
> > >> >>
> > >> >> By way of example, the following pseudocode would filter inbound
> > >> >> connections at XDP if there is no corresponding service listening for
> > >> >> the traffic:
> > >> >>
> > >> >> struct bpf_sock_tuple tuple;
> > >> >> struct bpf_sock_ops *sk;
> > >> >>
> > >> >> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
> > >> >> sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
> > >> >> if (!sk) {
> > >> >> // Couldn't find a socket listening for this traffic. Drop.
> > >> >> return TC_ACT_SHOT;
> > >> >> }
> > >> >> bpf_sk_release(sk, 0);
> > >> >> return TC_ACT_OK;
> > >> >>
> > >> >> Signed-off-by: Joe Stringer <joe@...d.net.nz>
> > >> >> ---
> > >>
> > >> ...
> > >>
> > >> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
> > >> >> };
> > >> >> #endif
> > >> >>
> > >> >> +struct sock *
> > >> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
> > >> > Would it be possible to have another version that
> > >> > returns a sk without taking its refcnt?
> > >> > It may have performance benefit.
> > >>
> > >> Not really. The sockets are not RCU-protected, and established sockets
> > >> may be torn down without notice. If we don't take a reference, there's
> > >> no guarantee that the socket will continue to exist for the duration
> > >> of running the BPF program.
> > >>
> > >> From what I follow, the comment below has a hidden implication which
> > >> is that sockets without SOCK_RCU_FREE, eg established sockets, may be
> > >> directly freed regardless of RCU.
> > > Right, SOCK_RCU_FREE sk is the one I am concern about.
> > > For example, TCP_LISTEN socket does not require taking a refcnt
> > > now. Doing a bpf_sk_lookup() may have a rather big
> > > impact on handling TCP syn flood. or the usual intention
> > > is to redirect instead of passing it up to the stack?
> >
> > I see, if you're only interested in listen sockets then probably this
> > series could be extended with a new flag, eg something like
> > BPF_F_SK_FIND_LISTENERS which restricts the set of possible sockets
> > found to only listen sockets, then the implementation would call into
> > __inet_lookup_listener() instead of inet_lookup(). The presence of
> > that flag in the relevant register during CALL instruction would show
> > that the verifier should not reference-track the result, then there'd
> > need to be a check on the release to ensure that this unreferenced
> > socket is never released. Just a thought, completely untested and I
> > could still be missing some detail..
> >
> > That said, I don't completely follow how you would expect to handle
> > the traffic for sockets that are already established - the helper
> > would no longer find those sockets, so you wouldn't know whether to
> > pass the traffic up the stack for established traffic or not.
>
> I think Martin has a valid concern here that if somebody starts using
> this helper on the rx traffic the bpf program (via these two new
> helpers) will be doing refcnt++ and refcnt-- even for listener
> sockets which will cause synflood to suffer.
> One can argue that this is bpf author mistake, but without fixes
> (and api changes) to the helper the programmer doesn't really have a way
> of avoiding this situation.
> Also udp sockets don't need refcnt at all.
> How about we split this single helper into three:
> - bpf_sk_lookup_tcp_established() that will return refcnt-ed socket
> and has to be bpf_sk_release()d by the program.
> - bpf_sk_lookup_tcp_listener() that doesn't refcnt, since progs
> run in rcu.
> - bpf_sk_lookup_udp() that also doesn't refcnt.
> The logic you want to put into this helper can be easily
> replicated with these three helpers and the whole thing will
> be much faster.
> Thoughts?
Just came to my mind.
or can we explore something like:
On the bpf_sk_lookup() side, use __inet[6]_lookup()
and __udp[46]_lib_lookup() instead. That should
only take refcnt if it has to.
On the bpf_sk_release() side, it skips refcnt--
if sk is SOCK_RCU_FREE.
Powered by blists - more mailing lists