lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20201116144757.1920077-1-alexandre.chartre@oracle.com>
Date:   Mon, 16 Nov 2020 15:47:36 +0100
From:   Alexandre Chartre <alexandre.chartre@...cle.com>
To:     tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com,
        x86@...nel.org, dave.hansen@...ux.intel.com, luto@...nel.org,
        peterz@...radead.org, linux-kernel@...r.kernel.org,
        thomas.lendacky@....com, jroedel@...e.de
Cc:     konrad.wilk@...cle.com, jan.setjeeilers@...cle.com,
        junaids@...gle.com, oweisse@...gle.com, rppt@...ux.vnet.ibm.com,
        graf@...zon.de, mgross@...ux.intel.com, kuzuno@...il.com,
        alexandre.chartre@...cle.com
Subject: [RFC][PATCH v2 00/21] x86/pti: Defer CR3 switch to C code

Version 2 addressing comments from Andy:

- paranoid_entry/exit is back to assembly code. This avoids having
  a C version of SWAPGS and the need to disable stack-protector.
  (remove patches 8, 9, 21 from v1).

- SAVE_AND_SWITCH_TO_KERNEL_CR3 and RESTORE_CR3 are removed from
  paranoid_entry/exit and move to C (patch 19).

- __per_cpu_offset is mapped into the user page-table (patch 11)
  so that paranoid_entry can update GS before CR3 is switched.

- use a different stack canary with the user and kernel page-tables.
  This is a new patch in v2 to not leak the kernel stack canary
  in the user page-table (patch 21).

Patches are now based on v5.10-rc4.

----

With Page Table Isolation (PTI), syscalls as well as interrupts and
exceptions occurring in userspace enter the kernel with a user
page-table. The kernel entry code will then switch the page-table
from the user page-table to the kernel page-table by updating the
CR3 control register. This CR3 switch is currently done early in
the kernel entry sequence using assembly code.

This RFC proposes to defer the PTI CR3 switch until we reach C code.
The benefit is that this simplifies the assembly entry code, and make
the PTI CR3 switch code easier to understand. This also paves the way
for further possible projects such an easier integration of Address
Space Isolation (ASI), or the possibilily to execute some selected
syscall or interrupt handlers without switching to the kernel page-table
(and thus avoid the PTI page-table switch overhead).

Deferring CR3 switch to C code means that we need to run more of the
kernel entry code with the user page-table. To do so, we need to:

 - map more syscall, interrupt and exception entry code into the user
   page-table (map all noinstr code);

 - map additional data used in the entry code (such as stack canary);

 - run more entry code on the trampoline stack (which is mapped both
   in the kernel and in the user page-table) until we switch to the
   kernel page-table and then switch to the kernel stack;

 - have a per-task trampoline stack instead of a per-cpu trampoline
   stack, so the task can be scheduled out while it hasn't switched
   to the kernel stack.

Note that, for now, the CR3 switch can only be pushed as far as interrupts
remain disabled in the entry code. This is because the CR3 switch is done
based on the privilege level from the CS register from the interrupt frame.
I plan to fix this but that's some extra complication (need to track if the
user page-table is used or not).

The proposed patchset is in RFC state to get early feedback about this
proposal.

The code survives running a kernel build and LTP. Note that changes are
only for 64-bit at the moment, I haven't looked at 32-bit yet but I will
definitively check it.

Patches are based on v5.10-rc4.

Thanks,

alex.

-----

Alexandre Chartre (21):
  x86/syscall: Add wrapper for invoking syscall function
  x86/entry: Update asm_call_on_stack to support more function arguments
  x86/entry: Consolidate IST entry from userspace
  x86/sev-es: Define a setup stack function for the VC idtentry
  x86/entry: Implement ret_from_fork body with C code
  x86/pti: Provide C variants of PTI switch CR3 macros
  x86/entry: Fill ESPFIX stack using C code
  x86/pti: Introduce per-task PTI trampoline stack
  x86/pti: Function to clone page-table entries from a specified mm
  x86/pti: Function to map per-cpu page-table entry
  x86/pti: Extend PTI user mappings
  x86/pti: Use PTI stack instead of trampoline stack
  x86/pti: Execute syscall functions on the kernel stack
  x86/pti: Execute IDT handlers on the kernel stack
  x86/pti: Execute IDT handlers with error code on the kernel stack
  x86/pti: Execute system vector handlers on the kernel stack
  x86/pti: Execute page fault handler on the kernel stack
  x86/pti: Execute NMI handler on the kernel stack
  x86/pti: Defer CR3 switch to C code for IST entries
  x86/pti: Defer CR3 switch to C code for non-IST and syscall entries
  x86/pti: Use a different stack canary with the user and kernel
    page-table

 arch/x86/entry/common.c               |  58 ++++-
 arch/x86/entry/entry_64.S             | 346 +++++++++++---------------
 arch/x86/entry/entry_64_compat.S      |  22 --
 arch/x86/include/asm/entry-common.h   | 194 +++++++++++++++
 arch/x86/include/asm/idtentry.h       | 130 +++++++++-
 arch/x86/include/asm/irq_stack.h      |  11 +
 arch/x86/include/asm/page_64_types.h  |  36 ++-
 arch/x86/include/asm/processor.h      |   3 +
 arch/x86/include/asm/pti.h            |  18 ++
 arch/x86/include/asm/stackprotector.h |  35 ++-
 arch/x86/include/asm/switch_to.h      |   7 +-
 arch/x86/include/asm/traps.h          |   2 +-
 arch/x86/kernel/cpu/mce/core.c        |   7 +-
 arch/x86/kernel/espfix_64.c           |  41 +++
 arch/x86/kernel/nmi.c                 |  34 ++-
 arch/x86/kernel/sev-es.c              |  63 +++++
 arch/x86/kernel/traps.c               |  61 +++--
 arch/x86/mm/fault.c                   |  11 +-
 arch/x86/mm/pti.c                     |  76 ++++--
 include/linux/sched.h                 |   8 +
 kernel/fork.c                         |  25 ++
 21 files changed, 874 insertions(+), 314 deletions(-)

-- 
2.18.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ