[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPM31RKp=tAz8TQ=tCGQRNHUKWvrC9B4LV3wG+hBUr+rG_FMsQ@mail.gmail.com>
Date: Thu, 4 Jan 2018 01:24:41 -0800
From: Paul Turner <pjt@...gle.com>
To: LKML <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Greg Kroah-Hartman <gregkh@...ux-foundation.org>,
"Woodhouse, David" <dwmw@...zon.co.uk>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Dave Hansen <dave.hansen@...el.com>, tglx@...uxtronix.de,
Kees Cook <keescook@...gle.com>,
Rik van Riel <riel@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...capital.net>,
Jiri Kosina <jikos@...nel.org>, gnomes@...rguk.ukuu.org.uk
Subject: Re: [RFC] Retpoline: Binary mitigation for branch-target-injection
(aka "Spectre")
On Thu, Jan 4, 2018 at 1:10 AM, Paul Turner <pjt@...gle.com> wrote:
> Apologies for the discombobulation around today's disclosure. Obviously the
> original goal was to communicate this a little more coherently, but the
> unscheduled advances in the disclosure disrupted the efforts to pull this
> together more cleanly.
>
> I wanted to open discussion the "retpoline" approach and and define its
> requirements so that we can separate the core
> details from questions regarding any particular implementation thereof.
>
> As a starting point, a full write-up describing the approach is available at:
> https://support.google.com/faqs/answer/7625886
>
> The 30 second version is:
> Returns are a special type of indirect branch. As function returns are intended
> to pair with function calls, processors often implement dedicated return stack
> predictors. The choice of this branch prediction allows us to generate an
> indirect branch in which speculative execution is intentionally redirected into
> a controlled location by a return stack target that we control. Preventing
> branch target injections (also known as "Spectre") against these binaries.
>
> On the targets (Intel Xeon) we have measured so far, cost is within cycles of a
> "native" indirect branch for which branch prediction hardware has been disabled.
> This is unfortunately measurable -- from 3 cycles on average to about 30.
> However the cost is largely mitigated for many workloads since the kernel uses
> comparatively few indirect branches (versus say, a C++ binary). With some
> effort we have the average overall overhead within the 0-1.5% range for our
> internal workloads, including some particularly high packet processing engines.
>
> There are several components, the majority of which are independent of kernel
> modifications:
>
> (1) A compiler supporting retpoline transformations.
An implementation for LLVM is available at:
https://reviews.llvm.org/D41723
> (1a) Optionally: annotations for hand-coded indirect jmps, so that they may be
> made compatible with (1).
> [ Note: The only known indirect jmp which is not safe to convert, is the
> early virtual address check in head entry. ]
> (2) Kernel modifications for preventing return-stack underflow (see document
> above).
> The key points where this occurs are:
> - Context switches (into protected targets)
> - interrupt return (we return into potentially unwinding execution)
> - sleep state exit (flushes cashes)
> - guest exit.
> (These can be run-time gated, a full refill costs 30-45 cycles.)
> (3) Optional: Optimizations so that direct branches can be used for hot kernel
> indirects. While as discussed above, kernel execution generally depends on
> fewer indirect branches, there are a few places (in particular, the
> networking stack) where we have chained sequences of indirects on hot paths.
> (4) More general support for guarding against RSB underflow in an affected
> target. While this is harder to exploit and may not be required for many
> users, the approaches we have used here are not generally applicable.
> Further discussion is required.
>
> With respect to the what these deltas mean for an unmodified kernel:
Sorry this should have been, a kernel that does not care about this protection.
It has been a long day :-).
> (1a) At minimum annotation only. More complicated, config and
> run-time gated options are also possigble.
> (2) Trivially run-time & config gated.
> (3) The de-virtualizing of these branches improves performance in both the
> retpoline and non-retpoline cases.
>
> For an out of the box kernel that is reasonably protected, (1)-(3) are required.
>
> I apologize that this does not come with a clean set of patches, merging the
> things that we and Intel have looked at here. That was one of the original
> goals for this week. Strictly speaking, I think that Andi, David, and I have
> a fair amount of merging and clean-up to do here. This is an attempt
> to keep discussion of the fundamentals at least independent of that.
>
> I'm trying to keep the above reasonably compact/dense. I'm happy to expand on
> any details in sub-threads. I'll also link back some of the other compiler work
> which is landing for (1).
>
> Thanks,
>
> - Paul
Powered by blists - more mailing lists