linux-kernel - Re: [PATCH 4/5] bpf: sockmap, sockhash: return file descriptors from privileged lookup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <5e7113f16e7c6_278b2b1b264c65b445@john-XPS-13-9370.notmuch>
Date:   Tue, 17 Mar 2020 11:16:17 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Jakub Sitnicki <jakub@...udflare.com>,
        Lorenz Bauer <lmb@...udflare.com>
Cc:     John Fastabend <john.fastabend@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        kernel-team@...udflare.com, netdev@...r.kernel.org,
        bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/5] bpf: sockmap, sockhash: return file descriptors from
 privileged lookup

Jakub Sitnicki wrote:
> On Tue, Mar 10, 2020 at 06:47 PM CET, Lorenz Bauer wrote:
> > Allow callers with CAP_NET_ADMIN to retrieve file descriptors from a
> > sockmap and sockhash. O_CLOEXEC is enforced on all fds.
> >
> > Without this, it's difficult to resize or otherwise rebuild existing
> > sockmap or sockhashes.
> >
> > Suggested-by: Jakub Sitnicki <jakub@...udflare.com>
> > Signed-off-by: Lorenz Bauer <lmb@...udflare.com>
> > ---
> >  net/core/sock_map.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index 03e04426cd21..3228936aa31e 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -347,12 +347,31 @@ static void *sock_map_lookup(struct bpf_map *map, void *key)
> >  static int __sock_map_copy_value(struct bpf_map *map, struct sock *sk,
> >  				 void *value)
> >  {
> > +	struct file *file;
> > +	int fd;
> > +
> >  	switch (map->value_size) {
> >  	case sizeof(u64):
> >  		sock_gen_cookie(sk);
> >  		*(u64 *)value = atomic64_read(&sk->sk_cookie);
> >  		return 0;
> >
> > +	case sizeof(u32):
> > +		if (!capable(CAP_NET_ADMIN))
> > +			return -EPERM;
> > +
> > +		fd = get_unused_fd_flags(O_CLOEXEC);
> > +		if (unlikely(fd < 0))
> > +			return fd;
> > +
> > +		read_lock_bh(&sk->sk_callback_lock);
> > +		file = get_file(sk->sk_socket->file);
> 
> I think this deserves a second look.
> 
> We don't lock the sock, so what if tcp_close orphans it before we enter
> this critical section? Looks like sk->sk_socket might be NULL.
> 
> I'd find a test that tries to trigger the race helpful, like:
> 
>   thread A: loop in lookup FD from map
>   thread B: loop in insert FD into map, close FD

Agreed, this was essentially my question above as well.

When the psock is created we call sock_hold() and will only do a sock_put()
after an rcu grace period when its removed. So at least if you have the
sock here it should have a sk_refcnt. (Note the user data is set to NULL
so if you do reference psock you need to check its non-null.)

Is that enough to ensure sk_socket? Seems not to me, tcp_close for example
will still happen and call sock_orphan(sk) based on my admittddly quick
look.

Further, even if you do check sk->sk_socket is non-null what does it mean
to return a file with a socket that is closed, deleted from the sock_map
and psock removed? At this point is it just a dangling reference?

Still a bit confused as well what would or should happen when the sock is closed
after you have the file reference? I could probably dig up what exactly
would happen but I think we need it in the commiit message so we understand
it. I also didn't dig up the details here but if the receiver of the
fd crashes or otherwise disappears this hopefully all get cleaned up?

> 
> > +		read_unlock_bh(&sk->sk_callback_lock);
> > +
> > +		fd_install(fd, file);
> > +		*(u32 *)value = fd;
> > +		return 0;
> > +
> >  	default:
> >  		return -ENOSPC;
> >  	}