netdev - Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200613003356.sqp6zn3lnh4qeqyl@ast-mbp.dhcp.thefacebook.com>
Date:   Fri, 12 Jun 2020 17:33:56 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Stanislav Fomichev <sdf@...gle.com>
Cc:     netdev@...r.kernel.org, bpf@...r.kernel.org, davem@...emloft.net,
        ast@...nel.org, daniel@...earbox.net,
        David Laight <David.Laight@...LAB.COM>
Subject: Re: [PATCH bpf v3 1/2] bpf: don't return EINVAL from
 {get,set}sockopt when optlen > PAGE_SIZE

On Mon, Jun 08, 2020 at 11:27:47AM -0700, Stanislav Fomichev wrote:
> Attaching to these hooks can break iptables because its optval is
> usually quite big, or at least bigger than the current PAGE_SIZE limit.
> David also mentioned some SCTP options can be big (around 256k).
> 
> There are two possible ways to fix it:
> 1. Increase the limit to match iptables max optval. There is, however,
>    no clear upper limit. Technically, iptables can accept up to
>    512M of data (not sure how practical it is though).
> 
> 2. Bypass the value (don't expose to BPF) if it's too big and trigger
>    BPF only with level/optname so BPF can still decide whether
>    to allow/deny big sockopts.
> 
> The initial attempt was implemented using strategy #1. Due to
> listed shortcomings, let's switch to strategy #2. When there is
> legitimate a real use-case for iptables/SCTP, we can consider increasing
> the PAGE_SIZE limit.
> 
> v3:
> * don't increase the limit, bypass the argument
> 
> v2:
> * proper comments formatting (Jakub Kicinski)
> 
> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> Cc: David Laight <David.Laight@...LAB.COM>
> Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> ---
>  kernel/bpf/cgroup.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index fdf7836750a3..758082853086 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1276,9 +1276,18 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
>  
>  static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
>  {
> -	if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
> +	if (unlikely(max_optlen < 0))
>  		return -EINVAL;
>  
> +	if (unlikely(max_optlen > PAGE_SIZE)) {
> +		/* We don't expose optvals that are greater than PAGE_SIZE
> +		 * to the BPF program.
> +		 */
> +		ctx->optval = NULL;
> +		ctx->optval_end = NULL;
> +		return 0;
> +	}

It's probably ok, but makes me uneasy about verifier consequences.
ctx->optval is PTR_TO_PACKET and it's a valid pointer from verifier pov.
Do we have cases already where PTR_TO_PACKET == PTR_TO_PACKET_END ?
I don't think we have such tests. I guess bpf prog won't be able to read
anything and nothing will crash, but having PTR_TO_PACKET that is
actually NULL would be an odd special case to keep in mind for everyone
who will work on the verifier from now on.

Also consider bpf prog that simply reads something small like 4 bytes.
IP_FREEBIND sockopt (like your selftest in the patch 2) will have
those 4 bytes, so it's natural for the prog to assume that it can read it.
It will have
p = ctx->optval;
if (p + 4 > ctx->optval_end)
 /* goto out and don't bother logging, since that never happens */
*(u32*)p;

but 'clever' user space would pass long optlen and prog suddenly
'not seeing' the sockopt. It didn't crash, but debugging would be
surprising.

I feel it's better to copy the first 4k and let the program see it.