[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADa=RyxfhK+XhAwf_C_an=+RnsQCPCXV23Qrwk-3OC1oLdHM=A@mail.gmail.com>
Date: Sat, 18 May 2019 18:52:48 -0700
From: Joe Stringer <joe@...valent.com>
To: Martin Lau <kafai@...com>
Cc: Eric Dumazet <eric.dumazet@...il.com>, bpf@...r.kernel.org,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf] bpf: Check sk_fullsock() before returning from bpf_sk_lookup()
On Sat, May 18, 2019, 09:05 Martin Lau <kafai@...com> wrote:
>
> On Sat, May 18, 2019 at 08:38:46AM -1000, Joe Stringer wrote:
> > On Fri, May 17, 2019, 12:02 Martin Lau <kafai@...com> wrote:
> >
> > > On Fri, May 17, 2019 at 02:51:48PM -0700, Eric Dumazet wrote:
> > > >
> > > >
> > > > On 5/17/19 2:21 PM, Martin KaFai Lau wrote:
> > > > > The BPF_FUNC_sk_lookup_xxx helpers return RET_PTR_TO_SOCKET_OR_NULL.
> > > > > Meaning a fullsock ptr and its fullsock's fields in bpf_sock can be
> > > > > accessed, e.g. type, protocol, mark and priority.
> > > > > Some new helper, like bpf_sk_storage_get(), also expects
> > > > > ARG_PTR_TO_SOCKET is a fullsock.
> > > > >
> > > > > bpf_sk_lookup() currently calls sk_to_full_sk() before returning.
> > > > > However, the ptr returned from sk_to_full_sk() is not guaranteed
> > > > > to be a fullsock. For example, it cannot get a fullsock if sk
> > > > > is in TCP_TIME_WAIT.
> > > > >
> > > > > This patch checks for sk_fullsock() before returning. If it is not
> > > > > a fullsock, sock_gen_put() is called if needed and then returns NULL.
> > > > >
> > > > > Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
> > > > > Cc: Joe Stringer <joe@...valent.com>
> > > > > Signed-off-by: Martin KaFai Lau <kafai@...com>
> > > > > ---
> > > > > net/core/filter.c | 16 ++++++++++++++--
> > > > > 1 file changed, 14 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > > index 55bfc941d17a..85def5a20aaf 100644
> > > > > --- a/net/core/filter.c
> > > > > +++ b/net/core/filter.c
> > > > > @@ -5337,8 +5337,14 @@ __bpf_sk_lookup(struct sk_buff *skb, struct
> > > bpf_sock_tuple *tuple, u32 len,
> > > > > struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
> > > > > ifindex, proto, netns_id,
> > > flags);
> > > > >
> > > > > - if (sk)
> > > > > + if (sk) {
> > > > > sk = sk_to_full_sk(sk);
> > > > > + if (!sk_fullsock(sk)) {
> > > > > + if (!sock_flag(sk, SOCK_RCU_FREE))
> > > > > + sock_gen_put(sk);
> > > >
> > > > This looks a bit convoluted/weird.
> > > >
> > > > What about telling/asking __bpf_skc_lookup() to not return a non
> > > fullsock instead ?
> > > It is becausee some other helpers, like BPF_FUNC_skc_lookup_tcp,
> > > can return non fullsock
> > >
> >
> > FYI this is necessary for finding a transparently proxied socket for a
> > non-local connection (tproxy use case).
> You meant it is necessary to return a non fullsock from the
> BPF_FUNC_sk_lookup_xxx helpers?
Yes, that's what I want to associate with the skb so that the delivery
to the SO_TRANSPARENT is received properly.
For the first packet of a connection, we look up the socket using the
tproxy socket port as the destination, and deliver the packet there.
The SO_TRANSPARENT logic then kicks in and sends back the ack and
creates the non-full sock for the connection tuple, which can be
entirely unrelated to local addresses or ports.
For the second forward-direction packet, (ie ACK in 3-way handshake)
then we must deliver the packet to this non-full sock as that's what
is negotiating the proxied connection. If you look up using the packet
tuple then get the full sock from it, it will go back to the
SO_TRANSPARENT parent socket. Delivering the ACK there will result in
a RST being sent back, because the SO_TRANSPARENT socket is just there
to accept new connections for connections to be proxied. So this is
the case where I need the non-full sock.
(In practice, the lookup logic attempts the packet tuple first then if
that fails, uses the tproxy port for lookup to achieve the above).
Powered by blists - more mailing lists