lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202105101458.EC466299@keescook>
Date:   Mon, 10 May 2021 15:01:48 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Mark Rutland <mark.rutland@....com>
Cc:     linux-hardening@...r.kernel.org, Qing Zhao <qing.zhao@...cle.com>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Michal Marek <michal.lkml@...kovi.net>,
        linux-kernel@...r.kernel.org, linux-kbuild@...r.kernel.org,
        linux-security-module@...r.kernel.org
Subject: Re: [PATCH] Makefile: Introduce CONFIG_ZERO_CALL_USED_REGS

On Mon, May 10, 2021 at 02:45:03PM +0100, Mark Rutland wrote:
> About 31% of this seems to be due to GCC (almost) always clearing x16
> and x17 (see further down for numbers). I suspect that's because GCC has
> to assume that any (non-static) functions might be reached via a PLT
> which would clobber x16 and x17 with specific values.

Wheee.

> We also have a bunch of small functions with multiple returns, where
> each return path gets the full complement of zeroing instructions, e.g.
> 
> Stock:
> 
> | <fpsimd_sync_to_sve>:
> |        d503245f        bti     c
> |        f9400001        ldr     x1, [x0]
> |        7209003f        tst     w1, #0x800000
> |        54000040        b.eq    ffff800010014cc4 <fpsimd_sync_to_sve+0x14>  // b.none
> |        d65f03c0        ret
> |        d503233f        paciasp
> |        a9bf7bfd        stp     x29, x30, [sp, #-16]!
> |        910003fd        mov     x29, sp
> |        97fffdac        bl      ffff800010014380 <fpsimd_to_sve>
> |        a8c17bfd        ldp     x29, x30, [sp], #16
> |        d50323bf        autiasp
> |        d65f03c0        ret
> 
> With zero-call-regs:
> 
> | <fpsimd_sync_to_sve>:
> |        d503245f        bti     c
> |        f9400001        ldr     x1, [x0]
> |        7209003f        tst     w1, #0x800000
> |        540000c0        b.eq    ffff8000100152a8 <fpsimd_sync_to_sve+0x24>  // b.none
> |        d2800000        mov     x0, #0x0                        // #0
> |        d2800001        mov     x1, #0x0                        // #0
> |        d2800010        mov     x16, #0x0                       // #0
> |        d2800011        mov     x17, #0x0                       // #0
> |        d65f03c0        ret
> |        d503233f        paciasp
> |        a9bf7bfd        stp     x29, x30, [sp, #-16]!
> |        910003fd        mov     x29, sp
> |        97fffd17        bl      ffff800010014710 <fpsimd_to_sve>
> |        a8c17bfd        ldp     x29, x30, [sp], #16
> |        d50323bf        autiasp
> |        d2800000        mov     x0, #0x0                        // #0
> |        d2800001        mov     x1, #0x0                        // #0
> |        d2800010        mov     x16, #0x0                       // #0
> |        d2800011        mov     x17, #0x0                       // #0
> |        d65f03c0        ret
> 
> ... where we go from 12 instructions to 20, which is a ~67% bloat.

Yikes. Yeah, so that is likely a good example of missed optimization
opportunity.

> We have a bunch of cases like the above. Also note that per the AAPCS a
> function can clobber x0-17 (and x18 if it's not reserved for something
> like SCS), and I see a few places that clobber x1-x17.

Ah, gotcha. I wasn't quite sure which registers might qualify.

> [...]
> That's 441301 new MOVs, and the equivalent of 442511 new instructions
> overall. There are 135728 new MOVs to x16 and x17 specifically, which
> account for ~31% of that.

I assume the x16/x17 case could be addressed by the compiler if it
examined the need for PLTs, or is that too late (in the sense that the
linker is doing that phase)?

Regardless, I will update the documentation on this feature. :)

-- 
Kees Cook

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ