[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220610001743.z5nxapagwknlfjqi@kafai-mbp>
Date: Thu, 9 Jun 2022 17:17:43 -0700
From: Martin KaFai Lau <kafai@...com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: Jon Maxwell <jmaxwell37@...il.com>, netdev@...r.kernel.org,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, atenart@...nel.org, cutaylor-pub@...oo.com,
alexei.starovoitov@...il.com, joe@...ium.io, i@....io,
bpf@...r.kernel.org
Subject: Re: [PATCH net] net: bpf: fix request_sock leak in filter.c
On Thu, Jun 09, 2022 at 10:29:15PM +0200, Daniel Borkmann wrote:
> On 6/9/22 3:18 AM, Jon Maxwell wrote:
> > A customer reported a request_socket leak in a Calico cloud environment. We
> > found that a BPF program was doing a socket lookup with takes a refcnt on
> > the socket and that it was finding the request_socket but returning the parent
> > LISTEN socket via sk_to_full_sk() without decrementing the child request socket
> > 1st, resulting in request_sock slab object leak. This patch retains the
Great catch and debug indeed!
> > existing behaviour of returning full socks to the caller but it also decrements
> > the child request_socket if one is present before doing so to prevent the leak.
> >
> > Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
> > thanks to Antoine Tenart for the reproducer and patch input.
> >
> > Fixes: f7355a6c0497 bpf: ("Check sk_fullsock() before returning from bpf_sk_lookup()")
> > Fixes: edbf8c01de5a bpf: ("add skc_lookup_tcp helper")
Instead of the above commits, I think this dated back to
6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
> > Tested-by: Curtis Taylor <cutaylor-pub@...oo.com>
> > Co-developed-by: Antoine Tenart <atenart@...nel.org>
> > Signed-off-by:: Antoine Tenart <atenart@...nel.org>
> > Signed-off-by: Jon Maxwell <jmaxwell37@...il.com>
> > ---
> > net/core/filter.c | 20 ++++++++++++++------
> > 1 file changed, 14 insertions(+), 6 deletions(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 2e32cee2c469..e3c04ae7381f 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -6202,13 +6202,17 @@ __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> > {
> > struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
> > ifindex, proto, netns_id, flags);
> > + struct sock *sk1 = sk;
> > if (sk) {
> > sk = sk_to_full_sk(sk);
> > - if (!sk_fullsock(sk)) {
> > - sock_gen_put(sk);
> > + /* sk_to_full_sk() may return (sk)->rsk_listener, so make sure the original sk1
> > + * sock refcnt is decremented to prevent a request_sock leak.
> > + */
> > + if (!sk_fullsock(sk1))
> > + sock_gen_put(sk1);
> > + if (!sk_fullsock(sk))
In this case, sk1 == sk (timewait). It is a bit worrying to pass
sk to sk_fullsock(sk) after the above sock_gen_put().
I think Daniel's 'if (sk2 != sk) { sock_gen_put(sk); }' check is better.
>
> [ +Martin/Joe/Lorenz ]
>
> I wonder, should we also add some asserts in here to ensure we don't get an unbalance for the
> bpf_sk_release() case later on? Rough pseudocode could be something like below:
>
> static struct sock *
> __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> struct net *caller_net, u32 ifindex, u8 proto, u64 netns_id,
> u64 flags)
> {
> struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
> ifindex, proto, netns_id, flags);
> if (sk) {
> struct sock *sk2 = sk_to_full_sk(sk);
>
> if (!sk_fullsock(sk2))
> sk2 = NULL;
> if (sk2 != sk) {
> sock_gen_put(sk);
> if (unlikely(sk2 && !sock_flag(sk2, SOCK_RCU_FREE))) {
I don't think it matters if the helper-returned sk2 is refcounted or not (SOCK_RCU_FREE).
The verifier has ensured the bpf_sk_lookup() and bpf_sk_release() are
always balanced regardless of the type of sk2.
bpf_sk_release() will do the right thing to check the sk2 is refcounted or not
before calling sock_gen_put().
The bug here is the helper forgot to call sock_gen_put(sk) while
the verifier only tracks the sk2, so I think the 'if (unlikely...) { WARN_ONCE(...); }'
can be saved.
> WARN_ONCE(1, "Found non-RCU, unreferenced socket!");
> sk2 = NULL;
> }
> }
> sk = sk2;
> }
> return sk;
> }
Powered by blists - more mailing lists