netdev - Re: [PATCH bpf] bpf: prevent out-of-bounds speculation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2db9819-d267-f6d1-1b29-5493cc1f7adc@fb.com>
Date:   Mon, 8 Jan 2018 15:20:27 -0800
From:   Alexei Starovoitov <ast@...com>
To:     Mark Rutland <mark.rutland@....com>
CC:     "David S . Miller" <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jann Horn <jannh@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Elena Reshetova <elena.reshetova@...el.com>,
        Alan Cox <alan@...ux.intel.com>, <netdev@...r.kernel.org>,
        <kernel-team@...com>, <will.deacon@....com>
Subject: Re: [PATCH bpf] bpf: prevent out-of-bounds speculation

On 1/8/18 9:05 AM, Mark Rutland wrote:
> Hi Alexei,
>
> On Thu, Jan 04, 2018 at 08:28:11PM -0800, Alexei Starovoitov wrote:
>> From: Alexei Starovoitov <ast@...nel.org>
>>
>> Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
>> memory accesses under a bounds check may be speculated even if the
>> bounds check fails, providing a primitive for building a side channel.
>>
>> To avoid leaking kernel data round up array-based maps and mask the index
>> after bounds check, so speculated load with out of bounds index will load
>> either valid value from the array or zero from the padded area.
>
> Thanks for putting this together, this certainly looks neat.
>
> I'm a little worried that in the presence of some CPU/compiler
> optimisations, the masking may effectively be skipped under speculation.
> So I'm not sure how robust this is going to be.
>
> More on that below.
>
>> To avoid duplicating map_lookup functions for root/unpriv always generate
>> a sequence of bpf instructions equivalent to map_lookup function for
>> array and array_of_maps map types when map was created by unpriv user.
>> And unconditionally mask index for percpu_array, since it's fast enough,
>> even when max_entries are not rounded to power of 2 for root user,
>> since percpu_array doesn't have map_gen_lookup callback yet.
>
> Is there a noticeable slowdown from the masking? Can't we always have
> that in place?

right. Please see v3 version:
https://patchwork.ozlabs.org/patch/856645/
Daniel noticed that speculation can happen without program being
loaded and we need to tighten the path via syscall as well.
so v3 is doing masking for all array types unconditionally.
The perf cost is within noise for interpreter and
not seen with JITed root code, since gen_lookup does not
emit AND for root.

>> @@ -157,7 +175,7 @@ static void *percpu_array_map_lookup_elem(struct bpf_map *map, void *key)
>>  	if (unlikely(index >= array->map.max_entries))
>>  		return NULL;
>>
>> -	return this_cpu_ptr(array->pptrs[index]);
>> +	return this_cpu_ptr(array->pptrs[index & array->index_mask]);
>
> As above, I think this isn't necessarily robust, as CPU/compiler
> optimisations can break the dependency on the index_mask, allowing
> speculation without a mask.
>
> e.g. a compiler could re-write this as:
>
> 	if (array->index_mask != 0xffffffff)
> 		index &= array->index_mask;
> 	return this_cpu_ptr(array->pptrs[index]);
>
> ... which would allow an unmasked index to be used in speculated paths.

prior to kernel I've been working on sun, gcc, llvm compilers
and I've never seen such optimization ever proposed for AND.
It makes no sense.

For heavy ALU like div/mod and calls compiler does indeed try
to predict the value. de-virtualization is an example optimization
for indirect calls. Intel compiler pioneered this approach back in 2000.

Compilers can also optimize "div by X" into
if (x == const)
   unroll div by const into something faster;
else
   div by X

Such optimizations are rarely done without profile feedback,
since branch is costly the compiler will add a branch only if there
is a clear win from introducing it instead of doing the operation.
For and, or, shift, add, sub there is never a case to do so.
Instead compiler is always trying to remove branches instead of
introducing them.

> Similar cases could occur with some CPU implementations. For example, HW
> value-prediction could result in the use of an all-ones mask under
> speculation.

please see the paper that Alan mentioned.
HW value speculation predicts likely valid values. It makes no sense
for HW to continue speculative execution with random value.
Consider array[index & index_mask]
if load index_mask stalls and cpu decides to continue speculation with
random value (both zero and ffff are considered random) it will proceed
through AND and second load will populate the precious cache
with completely irrelevant data.
Such cpu will be slower with speculative execution than without,
since it populates the caches with random data.

> I think that we may need to be able to provide an arch-specific
> pointer sanitization sequence (though we could certainly have masking as
> the default).

I still don't understand where this paranoia is coming from.
Kernel doesn't need to kill speculation. It needs to manage it.

> I have a rough idea as to how that could be plumbed into the JIT. First
> I need to verify the sequence I have in mind for arm/arm64 is
> sufficient.

hmm? the patch provided (both v2 and v3) doesn't need any JIT changes
on either x64, arm, etc.
gen_lookup() emits BPF_AND on index that JIT converts into actual AND
in native instruction set.