[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202210181020.79AF7F7@keescook>
Date: Tue, 18 Oct 2022 11:09:13 -0700
From: Kees Cook <keescook@...omium.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: x86@...nel.org, Sami Tolvanen <samitolvanen@...gle.com>,
Joao Moreira <joao@...rdrivepizza.com>,
linux-kernel@...r.kernel.org, Mark Rutland <mark.rutland@....com>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH] x86/ibt: Implement FineIBT
On Tue, Oct 18, 2022 at 03:35:50PM +0200, Peter Zijlstra wrote:
> Implement an alternative CFI scheme that merges both the fine-grained
> nature of kCFI but also takes full advantage of the coarse grained
> hardware CFI as provided by IBT.
Very nice to have!
> To contrast:
>
> kCFI is a pure software CFI scheme and relies on being able to read
> text -- specifically the instruction *before* the target symbol, and
> does the hash validation *before* doing the call (otherwise control
> flow is compromised already).
>
> FineIBT is a software and hardware hybrid scheme; by ensuring every
> branch target starts with a hash validation it is possible to place
> the hash validation after the branch. This has several advantages:
>
> o the (hash) load is avoided; no memop; no RX requirement.
>
> o IBT WAIT-FOR-ENDBR state is a speculation stop; by placing
> the hash validation in the immediate instruction after
> the branch target there is a minimal speculation window
> and the whole is a viable defence against SpectreBHB.
I still think it's worth noting it does technically weaken the
"attacker-controlled executable memory content injection" attack
requirements, too. While an attacker needs to make sure they place an
ENDBR at the start of their injected code, they no longer need to also
learn and inject the CFI hash too, as the malicious code can just not
do the check at all. The difference in protection currently isn't much.
It's not a very difficult requirement to get attacker-controlled bytes
into executable memory, as there are already existing APIs that provide
this to varying degrees of reachability, utility, and discoverability --
for example, BPF JIT when constant blinding isn't enabled (the unfortunate
default). And with the hashes currently being deterministic, there's no
secret that needs to be exposed first; an attack can just calculate it.
An improvement for kCFI would be to mutate all the hashes both at build
time (perhaps using the same seed infrastructure that randstruct depends
on for sharing a seed across compilation units), and at boot time, so
an actual .text content exposure is needed to find the target hash value.
> Obviously this patch relies on kCFI (upstream), but additionally it also
> relies on the padding from the call-depth-tracking patches
> (tip/x86/core). It uses this padding to place the hash-validation while
> the call-sites are re-written to modify the indirect target to be 16
> bytes in front of the original target, thus hitting this new preamble.
>
> Notably, there is no hardware that needs call-depth-tracking (Skylake)
> and supports IBT (Tigerlake and onwards).
>
> Suggested-by: Joao Moreira (Intel) <joao@...rdrivepizza.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> [...]
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2464,13 +2464,23 @@ config FUNCTION_PADDING_BYTES
> default FUNCTION_PADDING_CFI if CFI_CLANG
> default FUNCTION_ALIGNMENT
>
> +config CALL_PADDING
> + def_bool n
> + depends on CC_HAS_ENTRY_PADDING && OBJTOOL
> + select FUNCTION_ALIGNMENT_16B
> +
> +config FINEIBT
> + def_bool y
> + depends on X86_KERNEL_IBT && CFI_CLANG
> + select CALL_PADDING
To that end, can we please make this a prompted choice?
And this is a good time to ping you about this patch as well:
https://lore.kernel.org/lkml/20220902234213.3034396-1-keescook@chromium.org/
> [...]
> +#ifdef CONFIG_FINEIBT
> +/*
> + * kCFI FineIBT
> + *
> + * __cfi_\func: __cfi_\func:
> + * movl $0x12345678,%eax endbr64 // 4
kCFI emits endbr64 here first too ...
> + * nop subl $0x12345678,%r10d // 7
> + * nop jz 1f // 2
> + * nop ud2 // 2
> + * nop 1: nop // 1
> + * nop
> + * nop
> + * nop
> + * nop
> + * nop
> + * nop
> + * nop
Tangent: why are these nop instead of 0xcc? These bytes aren't executed
ever are they?
--
Kees Cook
Powered by blists - more mailing lists