[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1711241451350.1757@nanos>
Date: Fri, 24 Nov 2017 14:52:49 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: Ingo Molnar <mingo@...nel.org>
cc: linux-kernel@...r.kernel.org,
Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...capital.net>,
"H . Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Borislav Petkov <bp@...en8.de>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 15/43] x86/entry/64: Create a percpu SYSCALL entry
trampoline
On Fri, 24 Nov 2017, Ingo Molnar wrote:
> From: Andy Lutomirski <luto@...nel.org>
>
> Handling SYSCALL is tricky: the SYSCALL handler is entered with every
> single register (except FLAGS), including RSP, live. It somehow needs
> to set RSP to point to a valid stack, which means it needs to save the
> user RSP somewhere and find its own stack pointer. The canonical way
> to do this is with SWAPGS, which lets us access percpu data using the
> %gs prefix.
>
> With KAISER-like pagetable switching, this is problematic. Without a
> scratch register, switching CR3 is impossible, so %gs-based percpu
> memory would need to be mapped in the user pagetables. Doing that
> without information leaks is difficult or impossible.
>
> Instead, use a different sneaky trick. Map a copy of the first part
> of the SYSCALL asm at a different address for each CPU. Now RIP
> varies depending on the CPU, so we can use RIP-relative memory access
> to access percpu memory. By putting the relevant information (one
> scratch slot and the stack address) at a constant offset relative to
> RIP, we can make SYSCALL work without relying on %gs.
Smart!
> A nice thing about this approach is that we can easily switch it on
> and off if we want pagetable switching to be configurable.
>
> The compat variant of SYSCALL doesn't have this problem in the first
> place -- there are plenty of scratch registers, since we don't care
> about preserving r8-r15. This patch therefore doesn't touch SYSCALL32
> at all.
>
> XXX: Whenever we settle how KAISER gets turned on and off, we should do
> the same to this.
>
> Signed-off-by: Andy Lutomirski <luto@...nel.org>
Reviewed-by: Thomas Gleixner <tglx@...utronix.de>
Powered by blists - more mailing lists