[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOftzPipEjfy1p_98V+JmV3p_WJPzhE-_KfqC3UE3d-TYYxyww@mail.gmail.com>
Date: Wed, 25 Mar 2020 13:46:43 -0700
From: Joe Stringer <joe@...d.net.nz>
To: Lorenz Bauer <lmb@...udflare.com>
Cc: Joe Stringer <joe@...d.net.nz>, bpf <bpf@...r.kernel.org>,
Networking <netdev@...r.kernel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
Martin Lau <kafai@...com>
Subject: Re: [PATCHv2 bpf-next 4/5] bpf: Don't refcount LISTEN sockets in sk_assign()
On Wed, Mar 25, 2020 at 3:29 AM Lorenz Bauer <lmb@...udflare.com> wrote:
>
> On Wed, 25 Mar 2020 at 05:58, Joe Stringer <joe@...d.net.nz> wrote:
> >
> > Avoid taking a reference on listen sockets by checking the socket type
> > in the sk_assign and in the corresponding skb_steal_sock() code in the
> > the transport layer, and by ensuring that the prefetch free (sock_pfree)
> > function uses the same logic to check whether the socket is refcounted.
> >
> > Suggested-by: Martin KaFai Lau <kafai@...com>
> > Signed-off-by: Joe Stringer <joe@...d.net.nz>
> > ---
> > v2: Initial version
> > ---
> > include/net/sock.h | 25 +++++++++++++++++--------
> > net/core/filter.c | 6 +++---
> > net/core/sock.c | 3 ++-
> > 3 files changed, 22 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 1ca2e808cb8e..3ec1865f173e 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -2533,6 +2533,21 @@ skb_sk_is_prefetched(struct sk_buff *skb)
> > return skb->destructor == sock_pfree;
> > }
> >
> > +/* This helper checks if a socket is a full socket,
> > + * ie _not_ a timewait or request socket.
> > + */
> > +static inline bool sk_fullsock(const struct sock *sk)
> > +{
> > + return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> > +}
> > +
> > +static inline bool
> > +sk_is_refcounted(struct sock *sk)
> > +{
> > + /* Only full sockets have sk->sk_flags. */
> > + return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE);
> > +}
> > +
> > /**
> > * skb_steal_sock
> > * @skb to steal the socket from
> > @@ -2545,6 +2560,8 @@ skb_steal_sock(struct sk_buff *skb, bool *refcounted)
> > struct sock *sk = skb->sk;
> >
> > *refcounted = true;
> > + if (skb_sk_is_prefetched(skb))
> > + *refcounted = sk_is_refcounted(sk);
> > skb->destructor = NULL;
> > skb->sk = NULL;
> > return sk;
> > @@ -2553,14 +2570,6 @@ skb_steal_sock(struct sk_buff *skb, bool *refcounted)
> > return NULL;
> > }
> >
> > -/* This helper checks if a socket is a full socket,
> > - * ie _not_ a timewait or request socket.
> > - */
> > -static inline bool sk_fullsock(const struct sock *sk)
> > -{
> > - return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> > -}
> > -
> > /* Checks if this SKB belongs to an HW offloaded socket
> > * and whether any SW fallbacks are required based on dev.
> > * Check decrypted mark in case skb_orphan() cleared socket.
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 0fada7fe9b75..997b8606167e 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5343,8 +5343,7 @@ static const struct bpf_func_proto bpf_sk_lookup_udp_proto = {
> >
> > BPF_CALL_1(bpf_sk_release, struct sock *, sk)
> > {
> > - /* Only full sockets have sk->sk_flags. */
> > - if (!sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE))
> > + if (sk_is_refcounted(sk))
> > sock_gen_put(sk);
> > return 0;
> > }
> > @@ -5870,7 +5869,8 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
> > return -ESOCKTNOSUPPORT;
> > if (unlikely(dev_net(skb->dev) != sock_net(sk)))
> > return -ENETUNREACH;
> > - if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
> > + if (sk_is_refcounted(sk) &&
> > + unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
> > return -ENOENT;
> >
> > skb_orphan(skb);
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index cfaf60267360..a2ab79446f59 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -2076,7 +2076,8 @@ EXPORT_SYMBOL(sock_efree);
> > */
> > void sock_pfree(struct sk_buff *skb)
> > {
> > - sock_edemux(skb);
> > + if (sk_is_refcounted(skb->sk))
> > + sock_edemux(skb);
>
> sock_edemux calls sock_gen_put, which is also called by
> bpf_sk_release. Is it worth teaching sock_gen_put about
> sk_fullsock, and dropping the other helpers? I was considering this
> when fixing up sk_release, but then forgot
> about it.
I like the idea, but I'm concerned about breaking things outside the
focus of this new helper if the skb_sk_is_prefetched() function from
patch 1 is allowed to return true for sockets other than the ones
assigned from the bpf_sk_assign() helper. At a glance there's users of
sock_efree (which sock_edemux can be defined to) like netem_enqueue()
which may inadvertently trigger unexpected paths here. I think it's
more explicit so more obviously correct if the destructor pointer used
in this series is unique compared to other paths, even if the
underlying code is the same.
Powered by blists - more mailing lists