[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ythv7NqofIAHp3bf@worktop.programming.kicks-ass.net>
Date: Wed, 20 Jul 2022 23:13:16 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Sami Tolvanen <samitolvanen@...gle.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Joao Moreira <joao@...rdrivepizza.com>,
LKML <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
"Cooper, Andrew" <andrew.cooper3@...rix.com>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Johannes Wikner <kwikner@...z.ch>,
Alyssa Milburn <alyssa.milburn@...ux.intel.com>,
Jann Horn <jannh@...gle.com>, "H.J. Lu" <hjl.tools@...il.com>,
"Moreira, Joao" <joao.moreira@...el.com>,
"Nuzman, Joseph" <joseph.nuzman@...el.com>,
Steven Rostedt <rostedt@...dmis.org>,
"Gross, Jurgen" <jgross@...e.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Peter Collingbourne <pcc@...gle.com>,
Kees Cook <keescook@...omium.org>
Subject: Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
On Tue, Jul 19, 2022 at 10:19:18AM -0700, Sami Tolvanen wrote:
> Clang's current CFI implementation is somewhat similar to this. It
> creates separate thunks for address-taken functions and changes
> function addresses in C code to point to the thunks instead.
>
> While this works, it creates painful situations when interacting with
> assembly (e.g. a function address taken in assembly cannot be used
> for indirect calls in C as it doesn't point to the thunk) and needs
> unpleasant hacks when we want take the actual function address in C
> (i.e. scattering the code with function_nocfi() calls).
>
> I have to agree with Peter on this, I would rather avoid messing with
> function pointers in KCFI to avoid these issues.
It is either this; and I think I can avoid the worst of it (see below);
or grow the indirect_callsites to obscure the immediate (as Linus
suggested), there's around ~16k indirect callsites in a defconfig-ish
kernel, so growing it isn't too horrible, but it isn't nice either.
The prettiest option to obscure the immediate at the callsite I could
conjure up is something like:
kcfi_caller_linus:
movl $0x12345600, %r10d
movb $0x78, %r10b
cmpl %r10d, -OFFSET(%r11)
je 1f
ud2
1: call __x86_thunk_indirect_r11
Which comes to around 22 bytes (+5 over the original).
Joao suggested putting part of that in the retpoline thunk like:
kcfi_caller_joao:
movl $0x12345600, %r10d
movb $0x78, %r10b
call __x86_thunk_indirect_cfi
__x86_thunk_indirect_cfi:
cmpl %r10d, -OFFSET(%r11)
je 1f
ud2
1:
call 1f
int3
1:
mov %r11, (%rsp)
ret
int3
The only down-side there is that eIBRS hardware doesn't need retpolines
(given we currently default to ignoring Spectre-BHB) and as such this
doesn't really work nicely (we don't want to re-introduce funneling).
The other option I came up with, alluded to above, is below, and having
written it out, I'm pretty sure I faviour just growing the indirect
callsite as per Linus' option above.
Suppose:
indirect_callsite:
cmpl $0x12345678, -6(%r11) # 8
je 1f # 2
ud2 # 2
call __x86_indirect_thunk_r11 # 5 (-> .retpoline_sites)
__cfi_\func:
movl $0x12345678, %eax # 5
int3 # 1
int3 # 1
\func: # aligned 16
endbr # 4
nop12 # 12
call __fentry__ # 5
...
And for functions that do not get their address taken:
\func: # aligned 16
nop16 # 16
call __fentry__ # 5
...
Instead, extend the objtool .call_sites to also include tail-calls and
for:
- regular (!SKL, !IBT) systems;
* patch all direct calls/jmps to +16 (.call_sites)
* static_call/ftrace/etc.. can triviall add the +16
* retpolines can do +16 for the indirect calls
* retutn thunks are patched to ret;int3 (.return_sites)
(indirect calls for eIBRS which don't use retpoline
simply eat the nops)
- SKL systems;
* patch the first 16 bytes into:
nop6
sarq $5, PER_CPU_VAR(__x86_call_depth)
* patch all direct calls to +6 (.call_sites)
* patch all direct jumps to +16 (.call_sites)
* static_call/ftrace adjust to +6/+16 depending on instruction type
* retpolines are split between call/jmp and do +6/+16 resp.
* return thunks are patches to x86_return_skl (.return_sites)
- IBT systes;
* patch the first 16 bytes to:
endbr # 4
xorl $0x12345678, %r10d # 7
je 1f # 2
ud2 # 2
nop # 1
1:
* patch the callsites to: (.retpoline_sites)
movl $0x12345678, %r10d # 7
call *$r11 # 3
nop7 # 7
* patch all the direct calls/jmps to +16 (.call_sites)
* static_call/ftrace/etc.. add +16
* retutn thunks are patched to ret;int3 (.return_sites)
Yes, frobbing the address for static_call/ftrace/etc.. is a bit
horrible, but at least &sym remains exactly that address and not
something magical.
Note: It is possible to shift the __fentry__ call, but that would mean
that we loose alignment or get to carry .call_sites at runtime (and it
is *huge*)
Powered by blists - more mailing lists