[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202404290859.99763AA1@keescook>
Date: Mon, 29 Apr 2024 09:09:02 -0700
From: Kees Cook <keescook@...omium.org>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Xin Li <xin@...or.com>, Andrew Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: x86: dynamic pt_regs pointer and kernel stack randomization
On Sun, Apr 28, 2024 at 03:28:44PM -0700, H. Peter Anvin wrote:
> So the issue of having an explicit pointer to the user space pt_regs
> structure has come up in the context of future FRED extensions, which may
> increase the size of the exception stack frame under some circumstances,
> which may not be constant in the first place.
>
> It struck me that this can be combined with kernel stack randomization in
> such a way that it mostly amortizes the cost of both.
I've tried to go get up to speed with what FRED changes in x86, and IIUC
it effectively redefines the interrupt handling? Does it also change
syscall entry? (I got the impression it does not, but the mention of
syscalls later in your email make me think I've gotten this wrong.)
I think currently kstack randomization isn't applied to interrupts (but
it should be), so I'm all for finding a way to make this happen.
> Downside: for best performance, it should be pushed into assembly entry/exit
> code, although that is technically not *required* (and it is of course
> possible to do it in C on IDT but in the one single assembly entry stub for
> FRED.)
>
> In the FRED code it would look like [simplified]:
>
> asm_fred_entrypoint_user:
> /* Current code */
> /* ... */
> movq %rsp,%rdi /* part of FRED_ENTER */
>
> /* New code */
> movq %rdi,PER_CPU_VAR(user_pt_regs_ptr) /* [1] */
> #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
> subq PER_CPU_VAR(kstack_offset),%rsp /* [2] */
> #endif
> /* CFI annotation */
>
> /* Current code */
> call fred_entry_from_user
>
> asm_fred_exit_uTser:
> /* New code */
> movq PER_CPU_VAR(user_pt_regs_ptr),%rsp
> /* CFI annotation */
>
> /* Current code */
> /* ... */
> ERETU
If I'm understanding FRED correctly, I think this exit path
would need to call choose_random_kstack_offset() as well. (For
syscalls, add_random_kstack_offset() is used on entry and
choose_random_kstack_offset() is used on exit, so I think we'd want the
same pattern for interrupt handling.)
> [1] - This instruction can be pushed into the C code in
> fred_entry_from_user() without changing the functionality in any way.
>
> [2] - This is the ONLY instruction in this sequence that would be specific
> to CONFIG_RANDOMIZE_KSTACK_OFFSET, and it probably isn't even worth patching
> out.
>
> This requires a 64-bit premasked value to be generated byc
> choose_random_kstack_offset() which would seem to be a better option for
> performance, especially since there is already arithmetic done at that time.
> Otherwise it requires three instructions. This means the randomness
> accumulator ends up being in a separate variable from the premasked value.
> This could be further very slightly optimized by adding the actual stack
> location and making this a movq, but then that value has to be
> context-switched; this is probably not all that useful.
Yeah, having a pre-masked value ready to go seems fine to me. It's
simply a space trade-off.
> The masking needs to consider alignment, which the current code doesn't;
> that by itself adds additional code to the current code sequences.
>
>
> That is literally *all* the code that is needed to replace
> add_random_kstack_offset(). It doesn't block tailcall optimization anywhere.
> If user_pt_regs_ptr and kstack_offset share a cache line it becomes even
> cheaper.
>
> Note that at least in the FRED case this code would be invoked even on
> events other than system calls, some of which may be controllable by the
> user, like page faults. I am guessing that this is actually a plus.
Yeah, I'd like greater coverage for ring 3->0 transitions. We do want to
double-check the original design choices, though. I *think* they still
stand (see the comments for choose_random_kstack_offset() about entropy
location (per-cpu area), and lifetime (across userspace execution).
-Kees
--
Kees Cook
Powered by blists - more mailing lists