[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180904070455.GX24124@hirez.programming.kicks-ass.net>
Date: Tue, 4 Sep 2018 09:04:55 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Andy Lutomirski <luto@...nel.org>
Cc: x86@...nel.org, Borislav Petkov <bp@...en8.de>,
LKML <linux-kernel@...r.kernel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Joerg Roedel <joro@...tes.org>, Jiri Olsa <jolsa@...hat.com>,
Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH v2 3/3] x86/pti/64: Remove the SYSCALL64 entry trampoline
On Mon, Sep 03, 2018 at 03:59:44PM -0700, Andy Lutomirski wrote:
> The SYSCALL64 trampoline has a couple of nice properties:
>
> - The usual sequence of SWAPGS followed by two GS-relative accesses to
> set up RSP is somewhat slow because the GS-relative accesses need
> to wait for SWAPGS to finish. The trampoline approach allows
> RIP-relative accesses to set up RSP, which avoids the stall.
>
> - The trampoline avoids any percpu access before CR3 is set up,
> which means that no percpu memory needs to be mapped in the user
> page tables. This prevents using Meltdown to read any percpu memory
> outside the cpu_entry_area and prevents using timing leaks
> to directly locate the percpu areas.
>
> The downsides of using a trampoline may outweigh the upsides, however.
> It adds an extra non-contiguous I$ cache line to system calls, and it
> forces an indirect jump to transfer control back to the normal kernel
> text after CR3 is set up. The latter is because x86 lacks a 64-bit
> direct jump instruction that could jump from the trampoline to the entry
> text. With retpolines enabled, the indirect jump is extremely slow.
>
> This patch changes the code to map the percpu TSS into the user page
> tables to allow the non-trampoline SYSCALL64 path to work under PTI.
> This does not add a new direct information leak, since the TSS is
> readable by Meltdown from the cpu_entry_area alias regardless. It
> does allow a timing attack to locate the percpu area, but KASLR is
> more or less a lost cause against local attack on CPUs vulnerable to
> Meltdown regardless. As far as I'm concerned, on current hardware,
> KASLR is only useful to mitigate remote attacks that try to attack
> the kernel without first gaining RCE against a vulnerable user
> process.
>
> On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces
> syscall overhead from ~237ns to ~228ns.
>
> There is a possible alternative approach: we could instead move the
> trampoline within 2G of the entry text and make a separate copy for
> each CPU. Then we could use a direct jump to rejoin the normal
> entry path.
Can we have a few words on why this solution and not this alternative? I
mean, you raise the possibility, but then surely you chose not to
implement that. Might as well share that with us.
Powered by blists - more mailing lists