lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151009130654.GA10456@gmail.com>
Date:	Fri, 9 Oct 2015 15:06:54 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Andy Lutomirski <luto@...nel.org>
Cc:	x86@...nel.org, linux-kernel@...r.kernel.org,
	Brian Gerst <brgerst@...il.com>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH v2 00/36] x86: Rewrite all syscall entries except native
 64-bit


* Andy Lutomirski <luto@...nel.org> wrote:

> The first two patches are optimizations that I'm surprised we didn't
> already have.  I noticed them when I was looking at the generated
> asm.
> 
> The next two patches are tests and some old stuff.  There's a test
> that validates the vDSO AT_SYSINFO annotations.  There's also a test
> that exercises some assumptions that signal handling and ptracers
> make about syscalls that currently do *not* hold on 64-bit AMD using
> 32-bit AT_SYSINFO.
> 
> The next three patches are NT cleanups and a lockdep cleanup.
> 
> It may pay to apply the beginning of the series (at most through
> "x86/entry/64/compat: After SYSENTER, move STI after the NT fixup")
> without waiting for everyone to wrap their heads around the rest.
> 
> The rest is basically a rewrite of syscalls for all cases except
> 64-bit native.  With these patches applied, there is a single 32-bit
> vDSO and it uses SYSCALL, SYSENTER, and INT80 almost interchangeably
> via alternatives.  The semantics of SYSENTER and SYSCALL are defined
> as:
> 
>  1. If SYSCALL, ESP = ECX
>  2. ECX = *ESP
>  3. IP = INT80 landing pad
>  4. Opportunistic SYSRET/SYSEXIT is enabled on return
> 
> The vDSO is rearranged so that these semantics work.  Anything that
> backs IP up by 2 ends up pointing at a bona fide int $0x80
> instruction with the expected regs.
> 
> In the process, the vDSO CFI annotations (which are actually used)
> get rewritten using normal CFI directives.
> 
> Opportunistic SYSRET/SYSEXIT only happens on return when CS and SS
> are as expected, IP points to the INT80 landing pad, and flags are
> in good shape.  (There is no longer any assumption that full
> fast-path 32-bit syscalls don't muck with the registers that matter
> for fast exits -- I played with maintaining an optimization like
> that with poor results.  I may try again if it saves a few cycles.)
> 
> Other than that, the system call entries are simplified to the bare
> minimum prologue and a call to a C function.  Amusingly, SYSENTER
> and SYSCALL32 use the same C function.
> 
> To make that work, I had to remove all the 32-bit syscall stubs
> except the clone argument hack.  This is because, for C code to call
> through the system call table, the system call table entries need to
> be real function pointers with C-compatible ABIs.
> 
> There is nothing at all anymore that requires that x86_32 syscalls
> be asmlinkage.  That could be removed in a subsequent patch.
> 
> The upshot appears to be a ~16 cycle performance hit on 32-bit fast
> path syscalls.  (On my system, my little prctl test takes 172 cycles
> before and 188 cycles with these patches applied.)
> 
> The slow path is probably faster under most circumstances and, if
> the exit slow path gets hit, it'll be much faster because (as we
> already do in the 64-bit native case) we can still use
> SYSEXIT/SYSRET.
> 
> The patchset is structured as a removal of the old fast syscall
> code, then the change that makes syscalls into real functions, then
> a clean re-implementation of fast syscalls.
> 
> If we want some of the 25 cycles back, we could consider open-coding
> a new C fast path.
> 
> Changes from v1:
>  - The unwind_vdso_32 test now warns on broken Debian installations
>    instead of failing.  The problem is now fully understood, will
>    be fixed by Debian and possibly also fixed by upstream glibc.
>  - execve was rather broken in v1.
>  - It's quite a bit faster now (the optimizations at the end are mostly new).
>  - int80 on 64-bit no longer clobbers extra regs (thanks Denys!).
>  - The uaccess stuff is new.
>  - Lots of other things that I forgot, I'm sure.
> 
> Andy Lutomirski (36):
>   x86/uaccess: Tell the compiler that uaccess is unlikely to fault
>   x86/uaccess: __chk_range_not_ok is unlikely to return true
>   selftests/x86: Add a test for vDSO unwinding
>   selftests/x86: Add a test for syscall restart and arg modification
>   x86/entry/64/compat: Fix SYSENTER's NT flag before user memory access
>   x86/entry: Move lockdep_sys_exit to prepare_exit_to_usermode
>   x86/entry/64/compat: After SYSENTER, move STI after the NT fixup
>   x86/vdso: Remove runtime 32-bit vDSO selection
>   x86/asm: Re-add manual CFI infrastructure
>   x86/vdso: Define BUILD_VDSO while building and emit .eh_frame in asm
>   x86/vdso: Replace hex int80 CFI annotations with gas directives
>   x86/elf/64: Clear more registers in elf_common_init
>   x86/vdso/32: Save extra registers in the INT80 vsyscall path
>   x86/entry/64/compat: Disable SYSENTER and SYSCALL32 entries
>   x86/entry/64/compat: Remove audit optimizations
>   x86/entry/64/compat: Remove most of the fast system call machinery
>   x86/entry/64/compat: Set up full pt_regs for all compat syscalls
>   x86/entry/syscalls: Move syscall table declarations into
>     asm/syscalls.h
>   x86/syscalls: Give sys_call_ptr_t a useful type
>   x86/entry: Add do_syscall_32, a C function to do 32-bit syscalls
>   x86/entry/64/compat: Migrate the body of the syscall entry to C
>   x86/entry: Add C code for fast system call entries
>   x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace
>   x86/entry/compat: Implement opportunistic SYSRETL for compat syscalls
>   x86/entry/32: Open-code return tracking from fork and kthreads
>   x86/entry/32: Switch INT80 to the new C syscall path
>   x86/entry/32: Re-implement SYSENTER using the new C path
>   x86/asm: Remove thread_info.sysenter_return
>   x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscalls
>   x86/entry: Make irqs_disabled checks in exit code depend on lockdep
>   x86/entry: Force inlining of 32-bit syscall code
>   x86/entry: Micro-optimize compat fast syscall arg fetch
>   x86/entry: Hide two syscall entry assertions behind CONFIG_DEBUG_ENTRY
>   x86/entry: Use pt_regs_to_thread_info() in syscall entry tracing
>   x86/entry: Split and inline prepare_exit_to_usermode
>   x86/entry: Split and inline syscall_return_slowpath
> 
>  arch/x86/Makefile                                  |  10 +-
>  arch/x86/entry/common.c                            | 255 ++++++++--
>  arch/x86/entry/entry_32.S                          | 184 +++----
>  arch/x86/entry/entry_64.S                          |   9 +-
>  arch/x86/entry/entry_64_compat.S                   | 541 +++++----------------
>  arch/x86/entry/syscall_32.c                        |   9 +-
>  arch/x86/entry/syscall_64.c                        |   4 +-
>  arch/x86/entry/syscalls/syscall_32.tbl             |  12 +-
>  arch/x86/entry/vdso/Makefile                       |  39 +-
>  arch/x86/entry/vdso/vdso2c.c                       |   2 +-
>  arch/x86/entry/vdso/vdso32-setup.c                 |  28 +-
>  arch/x86/entry/vdso/vdso32/int80.S                 |  56 ---
>  arch/x86/entry/vdso/vdso32/syscall.S               |  75 ---
>  arch/x86/entry/vdso/vdso32/sysenter.S              | 116 -----
>  arch/x86/entry/vdso/vdso32/system_call.S           |  57 +++
>  arch/x86/entry/vdso/vma.c                          |  13 +-
>  arch/x86/ia32/ia32_signal.c                        |   4 +-
>  arch/x86/include/asm/dwarf2.h                      | 177 +++++++
>  arch/x86/include/asm/elf.h                         |  10 +-
>  arch/x86/include/asm/syscall.h                     |  14 +-
>  arch/x86/include/asm/thread_info.h                 |   1 -
>  arch/x86/include/asm/uaccess.h                     |  14 +-
>  arch/x86/include/asm/vdso.h                        |  10 +-
>  arch/x86/kernel/asm-offsets.c                      |   3 -
>  arch/x86/kernel/signal.c                           |   4 +-
>  arch/x86/um/sys_call_table_32.c                    |   7 +-
>  arch/x86/um/sys_call_table_64.c                    |   7 +-
>  arch/x86/xen/setup.c                               |  13 +-
>  tools/testing/selftests/x86/Makefile               |   5 +-
>  tools/testing/selftests/x86/ptrace_syscall.c       | 294 +++++++++++
>  .../testing/selftests/x86/raw_syscall_helper_32.S  |  46 ++
>  tools/testing/selftests/x86/unwind_vdso.c          | 209 ++++++++
>  32 files changed, 1258 insertions(+), 970 deletions(-)
>  delete mode 100644 arch/x86/entry/vdso/vdso32/int80.S
>  delete mode 100644 arch/x86/entry/vdso/vdso32/syscall.S
>  delete mode 100644 arch/x86/entry/vdso/vdso32/sysenter.S
>  create mode 100644 arch/x86/entry/vdso/vdso32/system_call.S
>  create mode 100644 arch/x86/include/asm/dwarf2.h
>  create mode 100644 tools/testing/selftests/x86/ptrace_syscall.c
>  create mode 100644 tools/testing/selftests/x86/raw_syscall_helper_32.S
>  create mode 100644 tools/testing/selftests/x86/unwind_vdso.c

Ok, so I applied all of them to tip:x86/asm, in two phases, with small (stylistic) 
edits - it all seems to work fine for me so far, so I pushed it all out to -tip 
and linux-next.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ