linux-kernel - Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220211133803.GV23216@worktop.programming.kicks-ass.net>
Date:   Fri, 11 Feb 2022 14:38:03 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Joao Moreira <joao@...rdrivepizza.com>
Cc:     Kees Cook <keescook@...omium.org>, x86@...nel.org,
        hjl.tools@...il.com, jpoimboe@...hat.com,
        andrew.cooper3@...rix.com, linux-kernel@...r.kernel.org,
        ndesaulniers@...gle.com, samitolvanen@...gle.com,
        llvm@...ts.linux.dev
Subject: Re: [RFC][PATCH 6/6] objtool: Add IBT validation / fixups

On Tue, Feb 08, 2022 at 09:18:44PM -0800, Joao Moreira wrote:
> > Ah, excellent, thanks for the pointers. There's also this in the works:
> > https://reviews.llvm.org/D119296 (a new CFI mode, designed to play nice
> > to objtool, IBT, etc.)
> 
> Oh, great! Thanks for pointing it out. I guess I saw something with a
> similar name before ;) https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf
> 
> Jokes aside (and perhaps questions more targeted to Sami), from a diagonal
> look it seems that this follows the good old tag approach proposed by
> PaX/grsecurity, right? If this is the case, should I assume it could also
> benefit from features like -mibt-seal? Also are you considering that perhaps
> we can use alternatives to flip different CFI instrumentation as suggested
> by PeterZ in another thread?

So, lets try and recap things from IRC yesterday. There's a whole bunch
of things intertwining making indirect branches 'interesting'. Most of
which I've not seen mentioned in Sami's KCFI proposal which makes it
rather pointless.

I think we'll end up with something related to KCFI, but with distinct
differences:

 - 32bit immediates for smaller code
 - __kcfi_check_fail() is out for smaller code
 - it must interact with IBT/BTI and retpolines
 - we must be very careful with speculation.

Right, so because !IBT-CFI needs the check at the call site, like:

caller:
	cmpl	$0xdeadbeef, -0x4(%rax)		# 7 bytes
	je	1f				# 2 bytes
	ud2					# 2 bytes
1:	call	__x86_indirect_thunk_rax	# 5 bytes


	.align 16
	.byte 0xef, 0xbe, 0xad, 0xde		# 4 bytes
func:
	...
	ret


While FineIBT has them at the landing site:

caller:
	movl	$0xdeadbeef, %r11d		# 6 bytes
	call	__x86_indirect_thunk_rax	# 5 bytes


	.align 16
func:
	endbr					# 4 bytes
	cmpl	$0xdeadbeef, %r11d		# 7 bytes
	je	1f				# 2 bytes
	ud2					# 2 bytes
1:	...
	ret


It seems to me that always doing the check at the call-site is simpler,
since it avoids code-bloat and patching work. That is, if we allow both
we'll effectivly blow up the code by 11 + 13 bytes (11 at the call site,
13 at function entry) as opposed to 11+4 or 6+13.

Which then yields:

caller:
	cmpl	$0xdeadbeef, -0x4(%rax)		# 7 bytes
	je	1f				# 2 bytes
	ud2					# 2 bytes
1:	call	__x86_indirect_thunk_rax	# 5 bytes


	.align 16
	.byte 0xef, 0xbe, 0xad, 0xde		# 4 bytes
func:
	endbr					# 4 bytes
	...
	ret

For a combined 11+8 bytes overhead :/

Now, this setup provides:

 - minimal size (please yell if there's a smaller option I missed;
   s/ud2/int3/ ?)
 - since the retpoline handles speculation from stuff before it, the
   load-miss induced speculation is covered.
 - the 'je' branch is binary, leading to either the retpoline or the
   ud2, both which are a speculation stop.
 - the ud2 is placed such that if the exception is non-fatal, code
   execution can recover
 - when IBT is present we can rewrite the thunk call to:

	lfence
	call *(%rax)

   and rely on the WAIT-FOR-ENDBR speculation stop (also 5 bytes).
 - can disable CFI by replacing the cmpl with:

	jmp	1f

   (or an 11 byte nop, which is just about possible). And since we
   already have all retpoline thunk callsites in a section, we can
   trivially find all CFI bits that are always in front it them.
 - function pointer sanity


Additionally, if we ensure all direct call are +4 and only indirect
calls hit the ENDBR -- as it optimal anyway, saves on decoding ENDBR. We
can replace those ENDBR instructions of functions that should never be
indirectly called with:

	ud1    0x0(%rax),%eax

which is a 4 byte #UD. This gives us the property that even on !IBT
hardware such a call will go *splat*.

Further, Andrew put in the request for __attribute__((cfi_seed(blah)))
to allow distinguishing indirect functions with otherwise identical
signature; eg. cookie = hash32(blah##signature).


Did I miss anything? Got anything wrong?