linux-kernel - Re: x86: dynamic pt_regs pointer and kernel stack randomization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4994087F-1CB6-40ED-BA6D-B0545AD754DD@zytor.com>
Date: Mon, 29 Apr 2024 12:10:03 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Kees Cook <keescook@...omium.org>
CC: Xin Li <xin@...or.com>, Andrew Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: x86: dynamic pt_regs pointer and kernel stack randomization

On April 29, 2024 9:09:02 AM PDT, Kees Cook <keescook@...omium.org> wrote:
>On Sun, Apr 28, 2024 at 03:28:44PM -0700, H. Peter Anvin wrote:
>> So the issue of having an explicit pointer to the user space pt_regs
>> structure has come up in the context of future FRED extensions, which may
>> increase the size of the exception stack frame under some circumstances,
>> which may not be constant in the first place.
>> 
>> It struck me that this can be combined with kernel stack randomization in
>> such a way that it mostly amortizes the cost of both.
>
>I've tried to go get up to speed with what FRED changes in x86, and IIUC
>it effectively redefines the interrupt handling? Does it also change
>syscall entry? (I got the impression it does not, but the mention of
>syscalls later in your email make me think I've gotten this wrong.)
>
>I think currently kstack randomization isn't applied to interrupts (but
>it should be), so I'm all for finding a way to make this happen.
>
>> Downside: for best performance, it should be pushed into assembly entry/exit
>> code, although that is technically not *required* (and it is of course
>> possible to do it in C on IDT but in the one single assembly entry stub for
>> FRED.)
>> 
>> In the FRED code it would look like [simplified]:
>> 
>> asm_fred_entrypoint_user:
>> 	/* Current code */
>> 	/* ... */
>> 	movq %rsp,%rdi		/* part of FRED_ENTER */
>> 
>> 	/* New code */
>> 	movq %rdi,PER_CPU_VAR(user_pt_regs_ptr)	/* [1] */
>> #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
>> 	subq PER_CPU_VAR(kstack_offset),%rsp	/* [2] */
>> #endif
>> 	/* CFI annotation */
>> 
>> 	/* Current code */
>> 	call fred_entry_from_user
>> 
>> asm_fred_exit_uTser:
>> 	/* New code */
>> 	movq PER_CPU_VAR(user_pt_regs_ptr),%rsp
>> 	/* CFI annotation */
>> 
>> 	/* Current code */
>> 	/* ... */
>> 	ERETU
>
>If I'm understanding FRED correctly, I think this exit path
>would need to call choose_random_kstack_offset() as well. (For
>syscalls, add_random_kstack_offset() is used on entry and
>choose_random_kstack_offset() is used on exit, so I think we'd want the
>same pattern for interrupt handling.)
>
>> [1] - This instruction can be pushed into the C code in
>> fred_entry_from_user() without changing the functionality in any way.
>> 
>> [2] - This is the ONLY instruction in this sequence that would be specific
>> to CONFIG_RANDOMIZE_KSTACK_OFFSET, and it probably isn't even worth patching
>> out.
>> 
>> This requires a 64-bit premasked value to be generated byc
>> choose_random_kstack_offset() which would seem to be a better option for
>> performance, especially since there is already arithmetic done at that time.
>> Otherwise it requires three instructions. This means the randomness
>> accumulator ends up being in a separate variable from the premasked value.
>> This could be further very slightly optimized by adding the actual stack
>> location and making this a movq, but then that value has to be
>> context-switched; this is probably not all that useful.
>
>Yeah, having a pre-masked value ready to go seems fine to me. It's
>simply a space trade-off.
>
>> The masking needs to consider alignment, which the current code doesn't;
>> that by itself adds additional code to the current code sequences.
>> 
>> 
>> That is literally *all* the code that is needed to replace
>> add_random_kstack_offset(). It doesn't block tailcall optimization anywhere.
>> If user_pt_regs_ptr and kstack_offset share a cache line it becomes even
>> cheaper.
>> 
>> Note that at least in the FRED case this code would be invoked even on
>> events other than system calls, some of which may be controllable by the
>> user, like page faults. I am guessing that this is actually a plus.
>
>Yeah, I'd like greater coverage for ring 3->0 transitions. We do want to
>double-check the original design choices, though. I *think* they still
>stand (see the comments for choose_random_kstack_offset() about entropy
>location (per-cpu area), and lifetime (across userspace execution).
>
>-Kees
>

FRED replaces ALL exception handling and ring transitions, including system calls and call gates. 

You can't enter ring 0 without going through the FRED entry point (and you can't enter rings 1 or 2 at all, officially deprecating them from the architecture.)