linux-kernel - Re: [PATCH bpf-next 11/11] bpf: Fall back to nospec for spec path verification

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87cyedie3w.fsf@fau.de>
Date: Wed, 19 Mar 2025 10:06:27 +0100
From: Luis Gerhorst <luis.gerhorst@....de>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann
 <daniel@...earbox.net>,
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
 <martin.lau@...ux.dev>,
	Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
	Yonghong Song <yonghong.song@...ux.dev>, John Fastabend
 <john.fastabend@...il.com>,
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao
 Luo <haoluo@...gle.com>,
	Jiri Olsa <jolsa@...nel.org>, Puranjay Mohan <puranjay@...nel.org>,
	Xu Kuohai <xukuohai@...weicloud.com>, Catalin Marinas
 <catalin.marinas@....com>,
	Will Deacon <will@...nel.org>, Hari Bathini <hbathini@...ux.ibm.com>,
	Christophe Leroy <christophe.leroy@...roup.eu>, Naveen N Rao
 <naveen@...nel.org>,
	Madhavan Srinivasan <maddy@...ux.ibm.com>, Michael Ellerman
 <mpe@...erman.id.au>,
	Nicholas Piggin <npiggin@...il.com>, Mykola Lysenko <mykolal@...com>, Shuah
 Khan <shuah@...nel.org>,
	Henriette Herzog <henriette.herzog@....de>, Cupertino Miranda
 <cupertino.miranda@...cle.com>,
	Matan Shachnai <m.shachnai@...il.com>, Dimitar Kanaliev
 <dimitar.kanaliev@...eground.com>,
	Shung-Hsi Yu <shung-hsi.yu@...e.com>, Daniel Xu <dxu@...uu.xyz>, bpf
 <bpf@...r.kernel.org>,
	linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>, LKML
 <linux-kernel@...r.kernel.org>,
	ppc-dev <linuxppc-dev@...ts.ozlabs.org>,
	"open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@...r.kernel.org>,
 George Guo <guodongtai@...inos.cn>,
	WANG Xuerui <git@...0n.name>, Tiezhu Yang <yangtiezhu@...ngson.cn>,
 Maximilian Ott <ott@...fau.de>,
	Milan Stephan <milan.stephan@....de>
Subject: Re: [PATCH bpf-next 11/11] bpf: Fall back to nospec for spec path
 verification

Alexei Starovoitov <alexei.starovoitov@...il.com> writes:

> On Thu, Mar 13, 2025 at 10:57 AM Luis Gerhorst <luis.gerhorst@....de> wrote:
>> With increased limits this allows applying mitigations to large BPF
>> progs such as the Parca Continuous Profiler's prog. However, this
>> requires a jump-seq limit of 256k. In any case, the same principle
>> should apply to smaller programs therefore include it even if the limit
>> stays at 8k for now. Most programs in [1] only require a limit of 32k.
>
> Do you mean that without this change the verifier needs 256k
> jmp limit to load Parca's prog as unpriv due to speculative
> path exploration with push_stack ?
>

Only with this change Parca is loadable when manually enabling Spectre
defenses for privileged programs and setting the following limits:
- BPF_COMPLEXITY_LIMIT_JMP_SEQ=256k
- BPF_COMPLEXITY_LIMIT_SPEC_V1_VERIFICATION=128k
- BPF_COMPLEXITY_LIMIT_INSNS=32M

>
> And this change uses 4k as a trade-off between prog runtime
> and verification time ?
>

Yes, this change uses 4k to limited nested speculative path exploration.
At the top-level (i.e., on architectural paths), spawned speculative
paths are still explored.

>
> But tracing progs use bpf_probe_read_kernel(), so they're never going
> to be unpriv.
>

I'm sorry this was not clear. Parca is only used as an example here
to test whether this improves expressiveness in general.

If you do not see this as a favorable tradeoff (because of the code
complexity), I think it would be acceptable to drop the last patch for
now. The biggest improvements is already achieved with the other patches
as evident from the selftests. I can do a more exhaustive analysis on
patch 11 later.

>
>> @@ -2010,6 +2011,19 @@ static struct bpf_verifier_state *push_stack(struct bpf_verifier_env *env,
>>         struct bpf_verifier_stack_elem *elem;
>>         int err;
>>
>> +       if (!env->bypass_spec_v1 &&
>> +           cur->speculative &&
>
> Should this be
> (cur->speculative || speculative)
> ?

No, I think it will be unsafe to add `|| speculative` here. If we were
to return -EINVAL from push_stack() when pushing a speculative path from
a non-speculative path (e.g., in check_cond_jmp_op() through
sanitize_speculative_path()), this will cause do_check() to add an
lfence before the jump-op.

Here's a minimal example program:

	A = true
	B = true
	if A goto e
	f()
	if B goto e
	unsafe()
e:	exit

There are the following speculative and non-speculative paths
(`cur->speculative` and `speculative` referring to the value of the
push_stack() parameters):

- A = true
- B = true
- if A goto e
  - A && !cur->speculative && !speculative
    - exit
  - !A && !cur->speculative && speculative
    - f()
    - if B goto e
      - B && cur->speculative && !speculative
        - exit
      - !B && cur->speculative && speculative
        - unsafe()

`|| speculative` might cause us to only add an lfence before `if A goto
e`, which would not be sufficient. Intel recommends adding the lfence
after the jump [1].

>
> In general I'm not convinced that the approach is safe.
>
> This recoverable EINVAL means that exploration under speculation
> stops early, but there could be more branches and they won't be
> sanitized with extra lfence.
> So speculative execution can still happen at later insns.
>

`goto process_bpf_exit` only causes us to stop analyzing this particular
path, not the rest of the program.

This is based on the assumption, that the lfence stops the CPU from ever
reaching those branches (if they are not reachable through other means).

>
> Similar concern in patch 7:
> + if (state->speculative && cur_aux(env)->nospec)
> +   goto process_bpf_exit;
>
> One lfence at this insn doesn't stop speculation until the program end.
> Only at this insn. The rest of the code is free to speculate.
>

Taking the program above as an example, this allows us to stop before
f() if an lfence was inserted there.

Fully patched program would be:

	A = true
	B = true
	if A goto e
	lfence
	f()
	if B goto e
	unsafe()
e:	exit

In this example, all instructions after the lfence are dead code (and
with the lfence they are also dead code speculatively).

I believe this is in line with Intel's guidance [1]:

	An LFENCE instruction or a serializing instruction will ensure that
	no later instructions execute, even speculatively, until all prior
	instructions complete locally. Developers might prefer LFENCE over a
	serializing instruction because LFENCE may have lower latency.
	Inserting an LFENCE instruction after a bounds check prevents later
	operations from executing before the bound check completes.

With regards to the example, this implies that `if B goto e` will not
execute before `if A goto e` completes. Once `if A goto e` completes,
the CPU should find that the speculation was wrong and continue with
`exit`.

If there is any other path that leads to `if B goto e` (and therefore
`unsafe()`) without going through `if A goto e`, then an lfence might of
course still be needed there. However, I assume this other path will be
explored separately and therefore be discovered by the verifier even if
the exploration discussed here stops at the lfence. If this assumption
is wrong, please let me know.

>
> The refactoring in patches 1-3 is nice.
> Patches 4-5 are tricky and somewhat questionable, but make sense.
> Patch 7 without early goto process_bpf_exit looks correct too,
> Patch 8 is borderline. Feels like it's opening the door for
> new vulnerabilities and space to explore for security researchers.
> We disabled unpriv bpf by default and have no intentions to enable it.
> Even if we land the whole thing the unpriv will stay disabled.
> So trade offs don't appear favorable.
>

Thank you very much for having a look. Let me know whether the above
resolves your concern.

In any case, should I separate patches 1-3 into another series?

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html
    ("Managed Runtime Speculative Execution Side Channel Mitigations")