linux-hardening - Re: [RFC PATCH v2 20/21] x86: Add support for CONFIG_CFI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <202205161531.3339CA95@keescook>
Date:   Mon, 16 May 2022 15:59:41 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Sami Tolvanen <samitolvanen@...gle.com>,
        linux-kernel@...r.kernel.org, Josh Poimboeuf <jpoimboe@...hat.com>,
        x86@...nel.org, Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Nathan Chancellor <nathan@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Joao Moreira <joao@...rdrivepizza.com>,
        Sedat Dilek <sedat.dilek@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        linux-hardening@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, llvm@...ts.linux.dev
Subject: Re: [RFC PATCH v2 20/21] x86: Add support for CONFIG_CFI_CLANG

On Mon, May 16, 2022 at 08:30:47PM +0200, Peter Zijlstra wrote:
> On Mon, May 16, 2022 at 10:15:00AM -0700, Sami Tolvanen wrote:
> > On Mon, May 16, 2022 at 2:54 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Fri, May 13, 2022 at 01:21:58PM -0700, Sami Tolvanen wrote:
> > > > With CONFIG_CFI_CLANG, the compiler injects a type preamble
> > > > immediately before each function and a check to validate the target
> > > > function type before indirect calls:
> > > >
> > > >   ; type preamble
> > > >   __cfi_function:
> > > >     int3
> > > >     int3
> > > >     mov <id>, %eax
> > > >     int3
> > > >     int3
> > > >   function:
> > > >     ...
> > >
> > > When I enable CFI_CLANG and X86_KERNEL_IBT I get:
> > >
> > > 0000000000000c80 <__cfi_io_schedule_timeout>:
> > > c80:   cc                      int3
> > > c81:   cc                      int3
> > > c82:   b8 b5 b1 39 b3          mov    $0xb339b1b5,%eax
> > > c87:   cc                      int3
> > > c88:   cc                      int3
> > >
> > > 0000000000000c89 <io_schedule_timeout>:
> > > c89:   f3 0f 1e fa             endbr64
> > >
> > >
> > > That seems unfortunate. Would it be possible to get an additional
> > > compiler option to suppress the endbr for all symbols that get a __cfi_
> > > preaamble?
> > 
> > What's the concern with the endbr? Dropping it would currently break
> > the CFI+IBT combination on newer hardware, no?
> 
> Well, yes, but also that combination isn't very interesting. See,
> 
>   https://lore.kernel.org/all/20220420004241.2093-1-joao@overdrivepizza.com/T/#m5d67fb010d488b2f8eee33f1eb39d12f769e4ad2
> 
> and the patch I did down-thread:
> 
>   https://lkml.kernel.org/r/YoJKhHluN4n0kZDm@hirez.programming.kicks-ass.net
> 
> If we have IBT, then FineIBT is a much better option than kCFI+IBT.

I'm still not convinced about this, but I'm on the fence.

Cons:
- FineIBT does callee-based hash verification, which means any
  attacker-constructed memory region just has to have an endbr and nops at
  "shellcode - 9". KCFI would need the region to have the hash at
  "shellcode - 6" and an endbr at "shellcode". However, that hash is well
  known, so it's not much protection.
- Potential performance hit due to making an additional "call" outside
  the cache lines of both caller and callee.

Pros:
- FineIBT can be done without read access to the kernel text, which will
  be nice in the exec-only future.

I'd kind of like the "dynamic FineIBT conversion" to be a config option,
at least at first. We could at least do performance comparisons between
them.

> Removing that superfluous endbr also shrinks the whole thing by 4 bytes.
> 
> So I'm fine with the compiler generating working code for that
> combination; but please get me an option to supress it in order to save
> those pointless bytes. All this CFI stuff is enough bloat as it is.

So, in the case of "built for IBT but running on a system without IBT",
no rewrite happens, and no endbr is present (i.e. address-taken
functions have endbr emission suppressed)?

Stock kernel build:
	function:
		[normal code]
	caller:
		call    __x86_indirect_thunk_r11

IBT kernel build:
	function:
		endbr
		[normal code]
	caller:
		call    __x86_indirect_thunk_r11

CFI kernel build:

	__cfi_function:
		[int3/mov/int3 preamble]
	function:
		[normal code]
	caller:
		cmpl    \hash, -6(%r11)
		je      .Ltmp1
		ud2
	.Ltmp1:
		call    __x86_indirect_thunk_r11

CFI+IBT kernel build:

	__cfi_function:
		[int3/mov/int3 preamble]
	function:
		endbr
		[normal code]
	caller:
		cmpl    \hash, -6(%r11)
		je      .Ltmp1
		ud2
	.Ltmp1:
		call    __x86_indirect_thunk_r11

CFI+IBT+FineIBT kernel build:

	__cfi_function:
		[int3/mov/int3 preamble]
	function:
		/* no endbr emitted */
		[normal code]
	caller:
		cmpl    \hash, -6(%r11)
		je      .Ltmp1
		ud2
	.Ltmp1:
		call    __x86_indirect_thunk_r11

	at boot, if IBT is detected:
	- replace __cfi_function with:
		endbr
		call __fineibt_\hash
	- replace caller with:
		movl    \hash, %r10d
		sub     $9, %r11
		nop2
		call    *%r11
	- inject all the __fineibt_\hash elements via module_alloc()
		__fineibt_\hash:
			xor     \hash, %r10
			jz      1f
			ud2
		1:	ret
			int3



-- 
Kees Cook