lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 10 Jun 2022 09:32:29 +1000
From:   Jonathan Maxwell <jmaxwell37@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>,
        Joe Stringer <joe@...ium.io>
Cc:     Netdev <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, pabeni@...hat.com,
        Antoine Tenart <atenart@...nel.org>, cutaylor-pub@...oo.com,
        alexei.starovoitov@...il.com, kafai@...com, i@....io,
        bpf@...r.kernel.org
Subject: Re: [PATCH net] net: bpf: fix request_sock leak in filter.c

On Fri, Jun 10, 2022 at 8:22 AM Joe Stringer <joe@...ium.io> wrote:
>
> On Thu, Jun 9, 2022 at 1:30 PM Daniel Borkmann <daniel@...earbox.net> wrote:
> >
> > On 6/9/22 3:18 AM, Jon Maxwell wrote:
> > > A customer reported a request_socket leak in a Calico cloud environment. We
> > > found that a BPF program was doing a socket lookup with takes a refcnt on
> > > the socket and that it was finding the request_socket but returning the parent
> > > LISTEN socket via sk_to_full_sk() without decrementing the child request socket
> > > 1st, resulting in request_sock slab object leak. This patch retains the
> > > existing behaviour of returning full socks to the caller but it also decrements
> > > the child request_socket if one is present before doing so to prevent the leak.
> > >
> > > Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
> > > thanks to Antoine Tenart for the reproducer and patch input.
> > >
> > > Fixes: f7355a6c0497 bpf: ("Check sk_fullsock() before returning from bpf_sk_lookup()")
> > > Fixes: edbf8c01de5a bpf: ("add skc_lookup_tcp helper")
> > > Tested-by: Curtis Taylor <cutaylor-pub@...oo.com>
> > > Co-developed-by: Antoine Tenart <atenart@...nel.org>
> > > Signed-off-by:: Antoine Tenart <atenart@...nel.org>
> > > Signed-off-by: Jon Maxwell <jmaxwell37@...il.com>
> > > ---
> > >   net/core/filter.c | 20 ++++++++++++++------
> > >   1 file changed, 14 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 2e32cee2c469..e3c04ae7381f 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -6202,13 +6202,17 @@ __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> > >   {
> > >       struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
> > >                                          ifindex, proto, netns_id, flags);
> > > +     struct sock *sk1 = sk;
> > >
> > >       if (sk) {
> > >               sk = sk_to_full_sk(sk);
> > > -             if (!sk_fullsock(sk)) {
> > > -                     sock_gen_put(sk);
> > > +             /* sk_to_full_sk() may return (sk)->rsk_listener, so make sure the original sk1
> > > +              * sock refcnt is decremented to prevent a request_sock leak.
> > > +              */
> > > +             if (!sk_fullsock(sk1))
> > > +                     sock_gen_put(sk1);
> > > +             if (!sk_fullsock(sk))
> > >                       return NULL;
> >
> > [ +Martin/Joe/Lorenz ]
> >
> > I wonder, should we also add some asserts in here to ensure we don't get an unbalance for the
> > bpf_sk_release() case later on? Rough pseudocode could be something like below:
> >
> > static struct sock *
> > __bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> >                  struct net *caller_net, u32 ifindex, u8 proto, u64 netns_id,
> >                  u64 flags)
> > {
> >          struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
> >                                             ifindex, proto, netns_id, flags);
> >          if (sk) {
> >                  struct sock *sk2 = sk_to_full_sk(sk);
> >
> >                  if (!sk_fullsock(sk2))
> >                          sk2 = NULL;
> >                  if (sk2 != sk) {
> >                          sock_gen_put(sk);
> >                          if (unlikely(sk2 && !sock_flag(sk2, SOCK_RCU_FREE))) {
> >                                  WARN_ONCE(1, "Found non-RCU, unreferenced socket!");
> >                                  sk2 = NULL;
> >                          }
> >                  }
> >                  sk = sk2;
> >          }
> >          return sk;
> > }
>
> This seems a bit more readable to me from the perspective of
> understanding the way that the socket references are tracked & freed.

Thanks for the suggestion Daniel and Joe, looks good to me, we will run some
tests with that implemented in our reproducer.

Regards

Jon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ