[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOftzPjh51QvdYYdsUFBRniHpAu=pTg5wX8T8LyhkiCze4eRVQ@mail.gmail.com>
Date: Thu, 9 Jun 2022 15:15:53 -0700
From: Joe Stringer <joe@...d.net.nz>
To: Jon Maxwell <jmaxwell37@...il.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, atenart@...nel.org,
cutaylor-pub@...oo.com
Subject: Re: [PATCH net] net: bpf: fix request_sock leak in filter.c
On Wed, Jun 8, 2022 at 6:21 PM Jon Maxwell <jmaxwell37@...il.com> wrote:
>
> A customer reported a request_socket leak in a Calico cloud environment. We
> found that a BPF program was doing a socket lookup with takes a refcnt on
> the socket and that it was finding the request_socket but returning the parent
> LISTEN socket via sk_to_full_sk() without decrementing the child request socket
> 1st, resulting in request_sock slab object leak. This patch retains the
> existing behaviour of returning full socks to the caller but it also decrements
> the child request_socket if one is present before doing so to prevent the leak.
>
> Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
> thanks to Antoine Tenart for the reproducer and patch input.
>
> Fixes: f7355a6c0497 bpf: ("Check sk_fullsock() before returning from bpf_sk_lookup()")
> Fixes: edbf8c01de5a bpf: ("add skc_lookup_tcp helper")
> Tested-by: Curtis Taylor <cutaylor-pub@...oo.com>
> Co-developed-by: Antoine Tenart <atenart@...nel.org>
> Signed-off-by:: Antoine Tenart <atenart@...nel.org>
> Signed-off-by: Jon Maxwell <jmaxwell37@...il.com>
> ---
> net/core/filter.c | 20 ++++++++++++++------
> 1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 2e32cee2c469..e3c04ae7381f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -6202,13 +6202,17 @@ __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> {
> struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
> ifindex, proto, netns_id, flags);
> + struct sock *sk1 = sk;
>
> if (sk) {
> sk = sk_to_full_sk(sk);
> - if (!sk_fullsock(sk)) {
> - sock_gen_put(sk);
> + /* sk_to_full_sk() may return (sk)->rsk_listener, so make sure the original sk1
> + * sock refcnt is decremented to prevent a request_sock leak.
> + */
> + if (!sk_fullsock(sk1))
> + sock_gen_put(sk1);
> + if (!sk_fullsock(sk))
> return NULL;
> - }
> }
>
> return sk;
Thinking through the constraints of this function:
1. If the return value is NULL, then all references taken during the
processing must be released.
2. If the return value is non-NULL, then the socket must either have
gained one reference OR it must have the SOCK_RCU_FREE flag set.
3. It also shouldn't return TIME_WAIT / request sockets (!sk_fullsock(sk)).
__bpf_skc_lookup() will give us the properties of (1)/(2) in a socket
that may or may not be `sk_is_refcounted()` at the start of the
function, so then we just need to consider the logic being changed
here.
Digging further, are these statements accurate?
* sk_to_full_sk() can either return the argument or a different listen socket.
* Iff sk1 and sk are the same, then we only need to consider (3),
hence the fullsock check, then depending on what type of socket it is,
we satisfy either (1) (current sock_gen_put() call + NULL) or (2)
(just return).
* Iff sk1 and sk are different, then we should release the reference
on sk1 and then do something with sk following the constraints above.
* Iff sk1 and sk are different, then sk must be a LISTEN socket.
* LISTEN sockets always have SOCK_RCU_FREE.
* Therefore, if sk1 and sk are different, we must release the
reference on sk1 and we do not need to take a reference on sk, and we
can just return sk.
Following the above, the implementation looks concise and follows the
logic for each case. I can't help but think that it would be easier to
read with an sk_is_refcounted() call in there though since the concern
is how the references for sk vs sk1 are tracked in this function.
Thanks,
Joe
Powered by blists - more mailing lists