linux-kernel - Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Ythv7NqofIAHp3bf@worktop.programming.kicks-ass.net>
Date:   Wed, 20 Jul 2022 23:13:16 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Sami Tolvanen <samitolvanen@...gle.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Joao Moreira <joao@...rdrivepizza.com>,
        LKML <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        "Cooper, Andrew" <andrew.cooper3@...rix.com>,
        Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
        Johannes Wikner <kwikner@...z.ch>,
        Alyssa Milburn <alyssa.milburn@...ux.intel.com>,
        Jann Horn <jannh@...gle.com>, "H.J. Lu" <hjl.tools@...il.com>,
        "Moreira, Joao" <joao.moreira@...el.com>,
        "Nuzman, Joseph" <joseph.nuzman@...el.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        "Gross, Jurgen" <jgross@...e.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Peter Collingbourne <pcc@...gle.com>,
        Kees Cook <keescook@...omium.org>
Subject: Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation

On Tue, Jul 19, 2022 at 10:19:18AM -0700, Sami Tolvanen wrote:

> Clang's current CFI implementation is somewhat similar to this. It
> creates separate thunks for address-taken functions and changes
> function addresses in C code to point to the thunks instead.
> 
> While this works, it creates painful situations when interacting with
> assembly (e.g. a function address taken in assembly cannot be used
> for indirect calls in C as it doesn't point to the thunk) and needs
> unpleasant hacks when we want take the actual function address in C
> (i.e. scattering the code with function_nocfi() calls).
> 
> I have to agree with Peter on this, I would rather avoid messing with
> function pointers in KCFI to avoid these issues.

It is either this; and I think I can avoid the worst of it (see below);
or grow the indirect_callsites to obscure the immediate (as Linus
suggested), there's around ~16k indirect callsites in a defconfig-ish
kernel, so growing it isn't too horrible, but it isn't nice either.

The prettiest option to obscure the immediate at the callsite I could
conjure up is something like:

kcfi_caller_linus:
	movl	$0x12345600, %r10d
	movb	$0x78, %r10b
	cmpl	%r10d, -OFFSET(%r11)
	je	1f
	ud2
1:	call	__x86_thunk_indirect_r11

Which comes to around 22 bytes (+5 over the original).

Joao suggested putting part of that in the retpoline thunk like:

kcfi_caller_joao:
	movl	$0x12345600, %r10d
	movb	$0x78, %r10b
	call	__x86_thunk_indirect_cfi

__x86_thunk_indirect_cfi:
	cmpl    %r10d, -OFFSET(%r11)
	je      1f
	ud2
1:
	call    1f
	int3
1:
	mov     %r11, (%rsp)
	ret
	int3

The only down-side there is that eIBRS hardware doesn't need retpolines
(given we currently default to ignoring Spectre-BHB) and as such this
doesn't really work nicely (we don't want to re-introduce funneling).


The other option I came up with, alluded to above, is below, and having
written it out, I'm pretty sure I faviour just growing the indirect
callsite as per Linus' option above.

Suppose:

indirect_callsite:
	cmpl	$0x12345678, -6(%r11)		# 8
	je	1f				# 2
	ud2					# 2
	call	__x86_indirect_thunk_r11	# 5	(-> .retpoline_sites)


__cfi_\func:
	movl	$0x12345678, %eax		# 5
	int3					# 1
	int3					# 1
\func:			# aligned 16
	endbr					# 4
	nop12					# 12
	call __fentry__				# 5
	...


And for functions that do not get their address taken:


\func:			# aligned 16
	nop16					# 16
	call __fentry__				# 5
	...



Instead, extend the objtool .call_sites to also include tail-calls and
for:

 - regular (!SKL, !IBT) systems;
   * patch all direct calls/jmps to +16		(.call_sites)
   * static_call/ftrace/etc.. can triviall add the +16
   * retpolines can do +16 for the indirect calls
   * retutn thunks are patched to ret;int3	(.return_sites)

   (indirect calls for eIBRS which don't use retpoline
    simply eat the nops)


 - SKL systems;
   * patch the first 16 bytes into:

	nop6
	sarq	$5, PER_CPU_VAR(__x86_call_depth)

   * patch all direct calls to +6		(.call_sites)
   * patch all direct jumps to +16		(.call_sites)
   * static_call/ftrace adjust to +6/+16 depending on instruction type
   * retpolines are split between call/jmp and do +6/+16 resp.
   * return thunks are patches to x86_return_skl (.return_sites)


 - IBT systes;
   * patch the first 16 bytes to:

	endbr					# 4
	xorl	$0x12345678, %r10d		# 7
	je	1f				# 2
	ud2					# 2
	nop					# 1
1:

   * patch the callsites to:			(.retpoline_sites)

	movl	$0x12345678, %r10d		# 7
	call	*$r11				# 3
	nop7					# 7

   * patch all the direct calls/jmps to +16	(.call_sites)
   * static_call/ftrace/etc.. add +16
   * retutn thunks are patched to ret;int3	(.return_sites)


Yes, frobbing the address for static_call/ftrace/etc.. is a bit
horrible, but at least &sym remains exactly that address and not
something magical.

Note: It is possible to shift the __fentry__ call, but that would mean
that we loose alignment or get to carry .call_sites at runtime (and it
is *huge*)