lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202105061416.3CB40BE5@keescook>
Date:   Thu, 6 May 2021 14:24:18 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Mark Rutland <mark.rutland@....com>
Cc:     linux-hardening@...r.kernel.org, Qing Zhao <qing.zhao@...cle.com>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Michal Marek <michal.lkml@...kovi.net>,
        linux-kernel@...r.kernel.org, linux-kbuild@...r.kernel.org,
        linux-security-module@...r.kernel.org
Subject: Re: [PATCH] Makefile: Introduce CONFIG_ZERO_CALL_USED_REGS

On Thu, May 06, 2021 at 01:54:57PM +0100, Mark Rutland wrote:
> Hi Kees,
> 
> On Wed, May 05, 2021 at 12:18:04PM -0700, Kees Cook wrote:
> > When CONFIG_ZERO_CALL_USED_REGS is enabled, build the kernel with
> > "-fzero-call-used-regs=used-gpr" (in GCC 11). This option will zero any
> > caller-used register contents just before returning from a function,
> > ensuring that temporary values are not leaked beyond the function
> > boundary. This means that register contents are less likely to be
> > available for side channel attacks and information exposures.
> > 
> > Additionally this helps reduce the number of useful ROP gadgets in the
> > kernel image by about 20%:
> > 
> > $ ROPgadget.py --nosys --nojop --binary vmlinux.stock | tail -n1
> > Unique gadgets found: 337245
> > 
> > $ ROPgadget.py --nosys --nojop --binary vmlinux.zero-call-regs | tail -n1
> > Unique gadgets found: 267175
> > 
> > and more notably removes simple "write-what-where" gadgets:
> > 
> > $ ROPgadget.py --ropchain --binary vmlinux.stock | sed -n '/Step 1/,/Step 2/p'
> > - Step 1 -- Write-what-where gadgets
> > 
> >         [+] Gadget found: 0xffffffff8102d76c mov qword ptr [rsi], rdx ; ret
> >         [+] Gadget found: 0xffffffff81000cf5 pop rsi ; ret
> >         [+] Gadget found: 0xffffffff8104d7c8 pop rdx ; ret
> >         [-] Can't find the 'xor rdx, rdx' gadget. Try with another 'mov [reg], reg'
> > 
> >         [+] Gadget found: 0xffffffff814c2b4c mov qword ptr [rsi], rdi ; ret
> >         [+] Gadget found: 0xffffffff81000cf5 pop rsi ; ret
> >         [+] Gadget found: 0xffffffff81001e51 pop rdi ; ret
> >         [-] Can't find the 'xor rdi, rdi' gadget. Try with another 'mov [reg], reg'
> > 
> >         [+] Gadget found: 0xffffffff81540d61 mov qword ptr [rsi], rdi ; pop rbx ; pop rbp ; ret
> >         [+] Gadget found: 0xffffffff81000cf5 pop rsi ; ret
> >         [+] Gadget found: 0xffffffff81001e51 pop rdi ; ret
> >         [-] Can't find the 'xor rdi, rdi' gadget. Try with another 'mov [reg], reg'
> > 
> >         [+] Gadget found: 0xffffffff8105341e mov qword ptr [rsi], rax ; ret
> >         [+] Gadget found: 0xffffffff81000cf5 pop rsi ; ret
> >         [+] Gadget found: 0xffffffff81029a11 pop rax ; ret
> >         [+] Gadget found: 0xffffffff811f1c3b xor rax, rax ; ret
> > 
> > - Step 2 -- Init syscall number gadgets
> > 
> > $ ROPgadget.py --ropchain --binary vmlinux.zero* | sed -n '/Step 1/,/Step 2/p'
> > - Step 1 -- Write-what-where gadgets
> > 
> >         [-] Can't find the 'mov qword ptr [r64], r64' gadget
> > 
> > In parallel build tests, this has a less than 1% performance impact,
> > and grows the image size less than 1%:
> > 
> > $ size vmlinux.stock vmlinux.zero-call-regs
> >    text    data     bss     dec     hex filename
> > 22437676   8559152 14127340 45124168 2b08a48 vmlinux.stock
> > 22453184   8563248 14110956 45127388 2b096dc vmlinux.zero-call-regs
> 
> FWIW, I gave this a go on arm64, and the size increase is a fair bit
> larger:
> 
> | [mark@...rids:~/src/linux]% ls -l Image* 
> | -rw-r--r-- 1 mark mark 31955456 May  6 13:36 Image.stock
> | -rw-r--r-- 1 mark mark 33724928 May  6 13:23 Image.zero-call-regs
> 
> | [mark@...rids:~/src/linux]% size vmlinux.stock vmlinux.zero-call-regs 
> |    text    data     bss     dec     hex filename
> | 20728552        11086474         505540 32320566        1ed2c36 vmlinux.stock
> | 22500688        11084298         505540 34090526        2082e1e vmlinux.zero-call-regs
> 
> The Image is ~5.5% bigger, and the .text in the vmlinux is ~8.5% bigger

Woo, that's quite a bit larger! So much so that I struggle to imagine
the delta. That's almost 1 extra instruction for every 10. I don't
imagine functions are that short. There seem to be only r9..r15 as
call-used. Even if every one was cleared at every function exit (28
bytes), that implies 63,290 functions, with an average function size of
40 instructions?

> The resulting Image appears to work, but I haven't done anything beyond
> booting, and I wasn't able to get ROPgadget.py going to quantify the
> number of gadgets.

Does it not like arm64 machine code? I can go check and see if I can get
numbers...

Thanks for looking at this!

-Kees

> 
> > Signed-off-by: Kees Cook <keescook@...omium.org>
> > ---
> >  Makefile                   |  5 +++++
> >  security/Kconfig.hardening | 17 +++++++++++++++++
> >  2 files changed, 22 insertions(+)
> > 
> > diff --git a/Makefile b/Makefile
> > index 31dcdb3d61fa..810600618490 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -811,6 +811,11 @@ KBUILD_CFLAGS	+= -ftrivial-auto-var-init=zero
> >  KBUILD_CFLAGS	+= -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
> >  endif
> >  
> > +# Clear used registers at func exit (to reduce data lifetime and ROP gadgets).
> > +ifdef CONFIG_ZERO_CALL_USED_REGS
> > +KBUILD_CFLAGS	+= -fzero-call-used-regs=used-gpr
> > +endif
> > +
> >  DEBUG_CFLAGS	:=
> >  
> >  # Workaround for GCC versions < 5.0
> > diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
> > index 269967c4fc1b..85f7f2036725 100644
> > --- a/security/Kconfig.hardening
> > +++ b/security/Kconfig.hardening
> > @@ -217,6 +217,23 @@ config INIT_ON_FREE_DEFAULT_ON
> >  	  touching "cold" memory areas. Most cases see 3-5% impact. Some
> >  	  synthetic workloads have measured as high as 8%.
> >  
> > +config CC_HAS_ZERO_CALL_USED_REGS
> > +	def_bool $(cc-option,-fzero-call-used-regs=used-gpr)
> > +
> > +config ZERO_CALL_USED_REGS
> > +	bool "Enable register zeroing on function exit"
> > +	depends on CC_HAS_ZERO_CALL_USED_REGS
> > +	help
> > +	  At the end of functions, always zero any caller-used register
> > +	  contents. This helps ensure that temporary values are not
> > +	  leaked beyond the function boundary. This means that register
> > +	  contents are less likely to be available for side channels
> > +	  and information exposures. Additionally, this helps reduce the
> > +	  number of useful ROP gadgets by about 20% (and removes compiler
> > +	  generated "write-what-where" gadgets) in the resulting kernel
> > +	  image. This has a less than 1% performance impact on most
> > +	  workloads, and grows the image size less than 1%.
> 
> I think the numbers need an "on x86" caveat, since they're not
> necessarily representative of other architectures.
> 
> This shows up under the "Memory initialization" sub-menu, but I assume
> it was meant to be directly under the "Kernel hardening options" menu...
> 
> > +
> >  endmenu
> 
> ... and should presumably be here?
> 
> Thanks,
> Mark.
> 
> >  
> >  endmenu
> > -- 
> > 2.25.1
> > 

-- 
Kees Cook

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ