netdev - Re: [bpf-next PATCH v2 2/3] bpf: sockmap, add sock close() hook to remove socks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <35a0e7ac-4bf3-fee9-492a-6e87caea5647@gmail.com>
Date:   Fri, 26 Jan 2018 13:02:24 -0800
From:   John Fastabend <john.fastabend@...il.com>
To:     ast@...nel.org, daniel@...earbox.net, davejwatson@...com
Cc:     netdev@...r.kernel.org,
        Prashant Bhole <bhole_prashant_q7@....ntt.co.jp>
Subject: Re: [bpf-next PATCH v2 2/3] bpf: sockmap, add sock close() hook to
 remove socks

On 01/26/2018 10:14 AM, John Fastabend wrote:
> The selftests test_maps program was leaving dangling BPF sockmap
> programs around because not all psock elements were removed from
> the map. The elements in turn hold a reference on the BPF program
> they are attached to causing BPF programs to stay open even after
> test_maps has completed.
> 
> The original intent was that sk_state_change() would be called
> when TCP socks went through TCP_CLOSE state. However, because
> socks may be in SOCK_DEAD state or the sock may be a listening
> socket the event is not always triggered.
> 
> To resolve this use the ULP infrastructure and register our own
> proto close() handler. This fixes the above case.
> 
> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
> Reported-by: Prashant Bhole <bhole_prashant_q7@....ntt.co.jp>
> Signed-off-by: John Fastabend <john.fastabend@...il.com>
> ---

[...]

v3 will be needed.

> +void bpf_tcp_close(struct sock *sk, long timeout)
> +{
> +	struct smap_psock_map_entry *e, *tmp;
> +	struct smap_psock *psock;
> +	struct sock *osk;
> +
> +	psock = smap_psock_sk(sk);
> +	if (unlikely(!psock))
> +		return sk->sk_prot->close(sk, timeout);
> +
> +	write_lock_bh(&sk->sk_callback_lock);
> +	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
> +		osk = cmpxchg(e->entry, sk, NULL);
> +		if (osk == sk) {
> +			list_del(&e->list);
> +			smap_release_sock(psock, sk);
> +		}
> +	}
> +	write_unlock_bh(&sk->sk_callback_lock);
> +	return psock->save_close(sk, timeout);

We need this to be in an RCU critical section. Else the release op
could free the psock immediately presumably. I've not actually triggered
this case, but seems possible. Also the save_close call can not be done
inside RCU so we need to cache the value and run it at the end. This is
OK because we have the sock lock held.

Probably like this,

void bpf_tcp_close(struct sock *sk, long timeout)
{
        struct smap_psock_map_entry *e, *tmp;
        struct smap_psock *psock;
        struct sock *osk;
        void (*close_fun)(struct sock *sk, long timeout);

        rcu_read_lock();
        psock = smap_psock_sk(sk);
        if (unlikely(!psock))
                return sk->sk_prot->close(sk, timeout);

        /* Although the psock may be destroyed, after RCU grace period, the
         * sk will not because we are holding the sock lock here. So we can
         * call the original close routine outside the RCU critical section.
         */
        close_fun = psock->save_close;

        write_lock_bh(&sk->sk_callback_lock);
        list_for_each_entry_safe(e, tmp, &psock->maps, list) {
                osk = cmpxchg(e->entry, sk, NULL);
                if (osk == sk) {
                        list_del(&e->list);
                        smap_release_sock(psock, sk);
                }
        }
        write_unlock_bh(&sk->sk_callback_lock);
        rcu_read_unlock(); 
        close_fun(sk, timeout);
}