linux-kernel - KAISER memory layout (Re: [PATCH 06/23] x86, kaiser: introduce user-mapped percpu areas)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrXLJfmTg1MsQHKCL=WL-he_5wrOqeX2OatQCCqVE003VQ@mail.gmail.com>
Date:   Thu, 2 Nov 2017 02:41:47 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Dave Hansen <dave.hansen@...ux.intel.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        moritz.lipp@...k.tugraz.at,
        Daniel Gruss <daniel.gruss@...k.tugraz.at>,
        michael.schwarz@...k.tugraz.at,
        Andrew Lutomirski <luto@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Kees Cook <keescook@...gle.com>,
        Hugh Dickins <hughd@...gle.com>, X86 ML <x86@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Josh Poimboeuf <jpoimboe@...hat.com>
Subject: KAISER memory layout (Re: [PATCH 06/23] x86, kaiser: introduce
 user-mapped percpu areas)

On Tue, Oct 31, 2017 at 3:31 PM, Dave Hansen
<dave.hansen@...ux.intel.com> wrote:
>
> These patches are based on work from a team at Graz University of
> Technology posted here: https://github.com/IAIK/KAISER
>

I think we're far enough along here that it may be time to nail down
the memory layout for real.  I propose the following:

The user tables will contain the following:

 - The GDT array.
 - The IDT.
 - The vsyscall page.  We can make this be _PAGE_USER.
 - The TSS.
 - The per-cpu entry stack.  Let's make it one page with guard pages
on either side.  This can replace rsp_scratch.
 - cpu_current_top_of_stack.  This could be in the same page as the TSS.
 - The entry text.
 - The percpu IST (aka "EXCEPTION") stacks.

That's it.

We can either try to move all of the above into the fixmap or we can
have the user tables be sparse a la Dave's current approach.  If we do
it the latter way, I think we'll want to add a mechanism to have holes
in the percpu space to give the entry stack a guard page.

I would *much* prefer moving everything into the fixmap, but that's a
wee bit awkward because we can't address per-cpu data in the fixmap
using %gs, which makes the SYSCALL code awkward.  But we could alias
the SYSCALL entry text itself per-cpu into the fixmap, which lets us
use %rip-relative addressing, which is quite nice.

So I guess my preference is to actually try the fixmap approach.  We
give the TSS the same aliasing treatment we gave the GDT, and I can
try to make the entry trampoline work through the fixmap and thus not
need %gs-based addressing until CR3 gets updated.  (This actually
saves several cycles of latency.)

What do you all think?

I'll deal with the LDT separately.  It will either live in the
fixmap-like region or it will live at the top of the user address
space.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/entry_consolidation