lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1515502580-12261-1-git-send-email-w@1wt.eu>
Date:   Tue,  9 Jan 2018 13:56:14 +0100
From:   Willy Tarreau <w@....eu>
To:     linux-kernel@...r.kernel.org, x86@...nel.org
Cc:     Willy Tarreau <w@....eu>, Andy Lutomirski <luto@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Ingo Molnar <mingo@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Kees Cook <keescook@...omium.org>
Subject: [RFC PATCH v2 0/6] Per process PTI activation

So here comes the second version after the first round of comments.

As suggested, I dropped the thread_info flag and placed it in the
mm_struct instead. There's now a per_cpu variable that can be checked
in the entry code to decide whether or not to switch CR3.

It's important to note that the new flag is lost upon execve(). I think
that this provides a better guarantee against any accidental use (eg: a
program calling some external helpers once in a while), but it also
means we can't use a wrapper anymore and have to modify the executable.

I continue to think that a mixed approach consisting in having a specific
flag that is only applied upon next execve() call and dropped could be
nice, but for now I'm not really sure how to do this cleanly.

Regarding the _PAGE_NX change, for now I didn't touch it. I like Andy's
approach consisting in changing it dynamically after the first page
fault caused by the return to userspace. I just don't know how to do
that for now.

I've split the entry code changes in two. The first part only updates the
kernel entry code to avoid updating CR3 if it already points to a kernel
PGD. The second one adds the flag check when going back to userspace.

This allowed me to check if the CR3-only changes brought any benefit, but
I failed to detect any improvement with that alone for now, including on
a preempt kernel.

With this patch, when haproxy starts with "arch_prctl(0x1022, 1)", the
performance drop compared to booting with "pti=off" is only ~1% and more
or less within measurement noise.

For now I've left the prctl to retrieve the current value as it helped
during debugging, though I think it should disappear before the final
version as it provides very little value.

Here are the numbers I'm seeing in the various situations for a few
tests on a hardware machine (core i7-4790K), numbers are in connections
per second, with the performance ratio compared to pti=off between
parenthesis :
                                     TEST(*)
                    reject       reject+acl       forward
 ---------------+-------------+---------------+----------------
  pti=off         444k (100%)    252k (100%)      83k (100%)
  pti=on          382k (86%)     195k (77%)       71k (85%)
  pti=on+prctl    439k (99%)     249k (99%)       83k (100%)

*: tests: 
   "reject"     : reject rule, accept(), setsockopt() and close()
   "reject+acl" : acl-based rule, does extra syscalls (getsockname(),
                  getsockopt, 2 setsockopt, recv, shutdown)
   "forward"    : connection forwarded to remote server, much heavier

It's interesting to node that the rule employing a few more syscalls
without adding much userspace work is obviously more impacted by PTI.
We have a total of 8 syscalls per connection on the middle one and
the difference is important.

Willy

Cc: Andy Lutomirski <luto@...nel.org>
Cc: Borislav Petkov <bp@...en8.de>
Cc: Brian Gerst <brgerst@...il.com>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: Ingo Molnar <mingo@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: "H. Peter Anvin" <hpa@...or.com>
Cc: Kees Cook <keescook@...omium.org>


Willy Tarreau (6):
  x86/mm: add a pti_disable entry in mm_context_t
  x86/arch_prctl: add ARCH_GET_NOPTI and ARCH_SET_NOPTI to
    enable/disable PTI
  x86/pti: add a per-cpu variable pti_disable
  x86/pti: don't mark the user PGD with _PAGE_NX.
  x86/entry/pti: avoid setting CR3 when it's already correct
  x86/entry/pti: don't switch PGD on when pti_disable is set

 arch/x86/entry/calling.h          | 25 +++++++++++++++++++++++++
 arch/x86/include/asm/mmu.h        |  4 ++++
 arch/x86/include/uapi/asm/prctl.h |  3 +++
 arch/x86/kernel/process_64.c      | 24 ++++++++++++++++++++++++
 arch/x86/mm/pti.c                 |  2 ++
 5 files changed, 58 insertions(+)

-- 
1.7.12.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ