netdev - Re: [PATCH net] net: bpf: fix request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <76972cdc-6a3c-2052-f353-06ebd2d61eca@iogearbox.net>
Date:   Fri, 10 Jun 2022 09:08:41 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Martin KaFai Lau <kafai@...com>
Cc:     Jon Maxwell <jmaxwell37@...il.com>, netdev@...r.kernel.org,
        davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
        pabeni@...hat.com, atenart@...nel.org, cutaylor-pub@...oo.com,
        alexei.starovoitov@...il.com, joe@...ium.io, i@....io,
        bpf@...r.kernel.org
Subject: Re: [PATCH net] net: bpf: fix request_sock leak in filter.c

On 6/10/22 2:17 AM, Martin KaFai Lau wrote:
> On Thu, Jun 09, 2022 at 10:29:15PM +0200, Daniel Borkmann wrote:
>> On 6/9/22 3:18 AM, Jon Maxwell wrote:
>>> A customer reported a request_socket leak in a Calico cloud environment. We
>>> found that a BPF program was doing a socket lookup with takes a refcnt on
>>> the socket and that it was finding the request_socket but returning the parent
>>> LISTEN socket via sk_to_full_sk() without decrementing the child request socket
>>> 1st, resulting in request_sock slab object leak. This patch retains the
> Great catch and debug indeed!
> 
>>> existing behaviour of returning full socks to the caller but it also decrements
>>> the child request_socket if one is present before doing so to prevent the leak.
>>>
>>> Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
>>> thanks to Antoine Tenart for the reproducer and patch input.
>>>
>>> Fixes: f7355a6c0497 bpf: ("Check sk_fullsock() before returning from bpf_sk_lookup()")
>>> Fixes: edbf8c01de5a bpf: ("add skc_lookup_tcp helper")
> Instead of the above commits, I think this dated back to
> 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
> 
>>> Tested-by: Curtis Taylor <cutaylor-pub@...oo.com>
>>> Co-developed-by: Antoine Tenart <atenart@...nel.org>
>>> Signed-off-by:: Antoine Tenart <atenart@...nel.org>
>>> Signed-off-by: Jon Maxwell <jmaxwell37@...il.com>
>>> ---
>>>    net/core/filter.c | 20 ++++++++++++++------
>>>    1 file changed, 14 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 2e32cee2c469..e3c04ae7381f 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -6202,13 +6202,17 @@ __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
>>>    {
>>>    	struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
>>>    					   ifindex, proto, netns_id, flags);
>>> +	struct sock *sk1 = sk;
>>>    	if (sk) {
>>>    		sk = sk_to_full_sk(sk);
>>> -		if (!sk_fullsock(sk)) {
>>> -			sock_gen_put(sk);
>>> +		/* sk_to_full_sk() may return (sk)->rsk_listener, so make sure the original sk1
>>> +		 * sock refcnt is decremented to prevent a request_sock leak.
>>> +		 */
>>> +		if (!sk_fullsock(sk1))
>>> +			sock_gen_put(sk1);
>>> +		if (!sk_fullsock(sk))
> In this case, sk1 == sk (timewait).  It is a bit worrying to pass
> sk to sk_fullsock(sk) after the above sock_gen_put().
> I think Daniel's 'if (sk2 != sk) { sock_gen_put(sk); }' check is better.
> 
>> [ +Martin/Joe/Lorenz ]
>>
>> I wonder, should we also add some asserts in here to ensure we don't get an unbalance for the
>> bpf_sk_release() case later on? Rough pseudocode could be something like below:
>>
>> static struct sock *
>> __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
>>                  struct net *caller_net, u32 ifindex, u8 proto, u64 netns_id,
>>                  u64 flags)
>> {
>>          struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
>>                                             ifindex, proto, netns_id, flags);
>>          if (sk) {
>>                  struct sock *sk2 = sk_to_full_sk(sk);
>>
>>                  if (!sk_fullsock(sk2))
>>                          sk2 = NULL;
>>                  if (sk2 != sk) {
>>                          sock_gen_put(sk);
>>                          if (unlikely(sk2 && !sock_flag(sk2, SOCK_RCU_FREE))) {
> I don't think it matters if the helper-returned sk2 is refcounted or not (SOCK_RCU_FREE).
> The verifier has ensured the bpf_sk_lookup() and bpf_sk_release() are
> always balanced regardless of the type of sk2.
> 
> bpf_sk_release() will do the right thing to check the sk2 is refcounted or not
> before calling sock_gen_put().
> 
> The bug here is the helper forgot to call sock_gen_put(sk) while
> the verifier only tracks the sk2, so I think the 'if (unlikely...) { WARN_ONCE(...); }'
> can be saved.

I was mainly thinking given in sk_lookup() we have the check around `sk && !refcounted &&
!sock_flag(sk, SOCK_RCU_FREE)` to check for unreferenced non-SOCK_RCU_FREE socket, and
given sk_to_full_sk() can return inet_reqsk(sk)->rsk_listener we don't have a similar
assertion there. Given we don't bump any ref on the latter, it must be SOCK_RCU_FREE then
as otherwise latter call to bpf_sk_release() will unbalance sk2. @Jon: maybe lets just
manually verify that such sk2 has SOCK_RCU_FREE and state it in the commit message for
future reference then, either is fine with me. Thanks!

>>                                  WARN_ONCE(1, "Found non-RCU, unreferenced socket!");
>>                                  sk2 = NULL;
>>                          }
>>                  }
>>                  sk = sk2;
>>          }
>>          return sk;
>> }