lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181018005420.82993-1-namit@vmware.com>
Date:   Wed, 17 Oct 2018 17:54:15 -0700
From:   Nadav Amit <namit@...are.com>
To:     Ingo Molnar <mingo@...hat.com>
CC:     Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "H . Peter Anvin " <hpa@...or.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        <linux-kernel@...r.kernel.org>, Nadav Amit <nadav.amit@...il.com>,
        <x86@...nel.org>, Borislav Petkov <bp@...en8.de>,
        David Woodhouse <dwmw@...zon.co.uk>,
        Nadav Amit <namit@...are.com>
Subject: [RFC PATCH 0/5] x86: dynamic indirect call promotion

This RFC introduces indirect call promotion in runtime, which for the
matter of simplification (and branding) will be called here "relpolines"
(relative call + trampoline). Relpolines are mainly intended as a way
of reducing retpoline overheads due to Spectre v2.

Unlike indirect call promotion through profile guided optimization, the
proposed approach does not require a profiling stage, works well with
modules whose address is unknown and can adapt to changing workloads.

The main idea is simple: for every indirect call, we inject a piece of
code with fast- and slow-path calls. The fast path is used if the target
matches the expected (hot) target. The slow-path uses a retpoline.
During training, the slow-path is set to call a function that saves the
call source and target in a hash-table and keep count for call
frequency. The most common target is then patched into the hot path.

The patching is done on-the-fly by patching the conditional branch
(opcode and offset) that is used to compare the target to the hot
target. This allows to direct all cores to the fast-path, while patching
the slow-path and vice-versa. Patching follows 2 more rules: (1) Only
patch a single byte when the code might be executed by any core. (2)
When patching more than one byte, ensure that all cores do not run the
to-be-patched-code by preventing this code from being preempted, and
using synchronize_sched() after patching the branch that jumps over this
code.

Changing all the indirect calls to use relpolines is done using assembly
macro magic. There are alternative solutions, but this one is
relatively simple and transparent. There is also logic to retrain the
software predictor, but the policy it uses may need to be refined.

Eventually the results are not bad (2 VCPU VM, throughput reported):

		base		relpoline
		----		---------
nginx 		22898 		25178 (+10%)
redis-ycsb	24523		25486 (+4%)
dbench		2144		2103 (+2%)

When retpolines are disabled, and if retraining is off, performance
benefits are up to 2% (nginx), but are much less impressive.

There are several open issues: retraining should be done when modules
are removed; CPU hotplug is not supported, x86-32 is probably broken and
the Makefile does not rebuild when the relpoline code is changed. Having
said that, I am worried that some of the approaches I took would
challenge the new code-of-conduct, so I though of getting some feedback
before putting more effort into it.

Nadav Amit (5):
  x86: introduce preemption disable prefix
  x86: patch indirect branch promotion
  x86: interface for accessing indirect branch locations
  x86: learning and patching indirect branch targets
  x86: relpoline: disabling interface

 arch/x86/entry/entry_64.S            |  10 +
 arch/x86/include/asm/nospec-branch.h | 158 +++++
 arch/x86/include/asm/sections.h      |   2 +
 arch/x86/kernel/Makefile             |   1 +
 arch/x86/kernel/asm-offsets.c        |   6 +
 arch/x86/kernel/macros.S             |   1 +
 arch/x86/kernel/nospec-branch.c      | 899 +++++++++++++++++++++++++++
 arch/x86/kernel/vmlinux.lds.S        |   7 +
 arch/x86/lib/retpoline.S             |  75 +++
 include/linux/module.h               |   5 +
 kernel/module.c                      |   8 +
 kernel/seccomp.c                     |   2 +
 12 files changed, 1174 insertions(+)
 create mode 100644 arch/x86/kernel/nospec-branch.c

-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ