linux-kernel - Re: [RFC 0/7] Prep code for better stack switching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrV0aeKKEXyP4x6yGSA15-4ojaR9tX0N1baxQ2kovLp4KA@mail.gmail.com>
Date:   Sat, 11 Nov 2017 20:25:28 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Borislav Petkov <bp@...e.de>, X86 ML <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Brian Gerst <brgerst@...il.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [RFC 0/7] Prep code for better stack switching

On Sat, Nov 11, 2017 at 6:59 PM, Andy Lutomirski <luto@...nel.org> wrote:
> On Sat, Nov 11, 2017 at 2:58 AM, Borislav Petkov <bp@...e.de> wrote:
>> On Fri, Nov 10, 2017 at 08:05:19PM -0800, Andy Lutomirski wrote:
>>> This isn't quite done (the TSS remap patch is busted on 32-bit, but
>>> that's a straightforward fix), but it should be ready for at least a
>>> conceptual review.
>>>
>>> The idea here is to prepare us to have all kernel data needed for
>>> user mode execution and early entry located in the fixmap.  To do
>>> this, I hijack the GDT remap mechanism and make it more general.  I
>>> add a struct cpu_entry_area.  This struct is never instantiated
>>> directly.  Instead, it represents the layout of a per-cpu portion of
>>> the fixmap.  That portion contains the GDT, the TSS (including IO
>>> bitmap), and the entry stack (for now just a part of the TSS
>>> region).  It should also end up containing the PEBS and BTS buffers.
>>>
>>> If this works, then the idea would be to add a magic *executable* page
>>> to cpu_entry_area.  That page would contain a stub like this:
>>>
>>> ENTRY(entry_SYSCALL_64_trampoline)
>>>       UNWIND_HINT_EMPTY
>>>       movq    %rsp, 0x1000+entry_SYSCALL_64_trampoline-1f(%rip)
>>> 1:
>>>       movq    0x1008+entry_SYSCALL_64_trampoline-1f(%rip), %rsp
>>> 1:
>>>       pushq   %rdi
>>>       pushq   %rsi
>>
>>>       movq    0x1000+entry_SYSCALL_64_trampoline-1f(%rip), %rsi
>>> 1:
>>>       movq    $entry_SYSCALL_64, %rdi
>>>       jmp     *%rdi
>>
>> So I'm wondering: r12-r15 are callee-preserved so why can't you
>> scratch into those on entry and leave rsi and rdi pristine so that
>> entry_SYSCALL_64 can get to work directly?
>
> I'm not sure I understand your suggestion.  SYSCALL has always
> preserved all regs except rcx, r11, flags, rax, and, depending on what
> signals are involved, the argument registers.  r12-r15 are definitely
> preserved, and existing userspace relies on that.
>
> Anyway, I'm halfway through actually implementing this, and it looks a
> wee bit different, but not much different.


Here it is:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_stack.wip&id=96a6ab74088a86f6b9b6df8284c6466e4fa50d08

Seems to work for me.

Dave, want to see if you can get this working cleanly without mapping
any percpu variables at all?  You'll probably have to move PEBS, etc
into cpu_entry_area.  For now, it should be safe to just ignore the
LDT.  I'm somewhat tempted to just adjust your code so that the fixmap
ends up being mapped separately for LDT-using tasks rather than
mucking with putting the LDT in the user address range.  The latter
involves a little more mm magic than I really want to deal with if I
can avoid it.