netdev - Re: [PATCH bpf-next v6 1/4] bpf: sockmap, refactor sockmap routines to work with hashmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <954e3b7f-9001-e528-c6ab-f8f69c84cfd8@iogearbox.net>
Date:   Tue, 15 May 2018 21:19:15 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     John Fastabend <john.fastabend@...il.com>, ast@...nel.org
Cc:     netdev@...r.kernel.org, davem@...emloft.net
Subject: Re: [PATCH bpf-next v6 1/4] bpf: sockmap, refactor sockmap routines
 to work with hashmap

On 05/14/2018 07:00 PM, John Fastabend wrote:
[...]
> +static int __sock_map_ctx_update_elem(struct bpf_map *map,
> +				      struct bpf_sock_progs *progs,
> +				      struct sock *sock,
> +				      struct sock **map_link,
> +				      void *key)
>  {
[...]
>  	 * sock being added. If the sock is already attached to BPF programs
>  	 * this results in an error.
>  	 */
> -	verdict = READ_ONCE(stab->bpf_verdict);
> -	parse = READ_ONCE(stab->bpf_parse);
> -	tx_msg = READ_ONCE(stab->bpf_tx_msg);
> +	verdict = READ_ONCE(progs->bpf_verdict);
> +	parse = READ_ONCE(progs->bpf_parse);
> +	tx_msg = READ_ONCE(progs->bpf_tx_msg);
>  
>  	if (parse && verdict) {
>  		/* bpf prog refcnt may be zero if a concurrent attach operation
> @@ -1703,11 +1691,11 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
>  		 * we increment the refcnt. If this is the case abort with an
>  		 * error.
>  		 */
> -		verdict = bpf_prog_inc_not_zero(stab->bpf_verdict);
> +		verdict = bpf_prog_inc_not_zero(progs->bpf_verdict);
>  		if (IS_ERR(verdict))
>  			return PTR_ERR(verdict);
>  
> -		parse = bpf_prog_inc_not_zero(stab->bpf_parse);
> +		parse = bpf_prog_inc_not_zero(progs->bpf_parse);
>  		if (IS_ERR(parse)) {
>  			bpf_prog_put(verdict);
>  			return PTR_ERR(parse);
> @@ -1715,7 +1703,7 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
>  	}
>  
>  	if (tx_msg) {
> -		tx_msg = bpf_prog_inc_not_zero(stab->bpf_tx_msg);
> +		tx_msg = bpf_prog_inc_not_zero(progs->bpf_tx_msg);
>  		if (IS_ERR(tx_msg)) {
>  			if (verdict)
>  				bpf_prog_put(verdict);

Not directly related to the patch since it doesn't change the logic, but
it feels something is not quite right here (unless I'm missing something).
The code is as follows:

[...]
        verdict = READ_ONCE(progs->bpf_verdict);
        parse = READ_ONCE(progs->bpf_parse);
        tx_msg = READ_ONCE(progs->bpf_tx_msg);

        if (parse && verdict) {
                /* bpf prog refcnt may be zero if a concurrent attach operation
                 * removes the program after the above READ_ONCE() but before
                 * we increment the refcnt. If this is the case abort with an
                 * error.
                 */
                verdict = bpf_prog_inc_not_zero(progs->bpf_verdict);
                if (IS_ERR(verdict))
                        return PTR_ERR(verdict);

                parse = bpf_prog_inc_not_zero(progs->bpf_parse);
                if (IS_ERR(parse)) {
                        bpf_prog_put(verdict);
                        return PTR_ERR(parse);
                }
        }

        if (tx_msg) {
                tx_msg = bpf_prog_inc_not_zero(progs->bpf_tx_msg);
                if (IS_ERR(tx_msg)) {
                        if (verdict)
                                bpf_prog_put(verdict);
                        if (parse)
                                bpf_prog_put(parse);
                        return PTR_ERR(tx_msg);
                }
        }
[...]

As you say in the comment above the function wrt locking notes that the
__sock_map_ctx_update_elem() can be called concurrently.

  All operations operate on sock_map using cmpxchg and xchg operations to ensure we
  do not get stale references. Any reads into the map must be done with READ_ONCE()
  because of this.

You initially use the READ_ONCE() on the verdict/parse/tx_msg, but later on when
grabbing the reference you use again progs->bpf_verdict/bpf_parse/bpf_tx_msg which
would potentially refetch it, but if updates would happen concurrently e.g. to the
three progs, they could be NULL in the mean-time, no? bpf_prog_inc_not_zero() would
then crash. Why are not the ones used that you fetched previously via READ_ONCE()
for taking the ref?

The second question I had is that verdict/parse/tx_msg are updated independently
from each other and each could be NULL or non-NULL. What if, say, parse is NULL
and verdict as well as tx_msg is non-NULL and the bpf_prog_inc_not_zero() on the
tx_msg prog fails. Doesn't this cause a use-after-free since a ref on verdict wasn't
taken earlier but the bpf_prog_put() will cause accidental misbalance/free of the
progs?

It would probably help to clarify the locking comment a bit more if indeed the
above should be okay as is.

Thanks,
Daniel