linux-kernel - Re: [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when pti

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrVcQg_1opnvOP4ksOAC07K4O_LTSxy2czwtObwR3YL+-w@mail.gmail.com>
Date:   Thu, 11 Jan 2018 09:09:14 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Willy Tarreau <w@....eu>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Kees Cook <keescook@...omium.org>
Subject: Re: [RFC PATCH v2 6/6] x86/entry/pti: don't switch PGD on when
 pti_disable is set

On Thu, Jan 11, 2018 at 7:44 AM, Willy Tarreau <w@....eu> wrote:
> Hi Dave,
>
> On Thu, Jan 11, 2018 at 07:29:30AM -0800, Dave Hansen wrote:
>> I don't think we need a "NOW" and "NEXT" mode, at least initially.  The
>> "NEXT" semantics are going to be tricky and I think "NOW" is good enough
>
> In fact I thought the NEXT one would bring us a nice benefit which is that
> we start the new process knowing the flag's value so we can decide whether
> or not to apply _PAGE_NX on the pgd from the start, and never touch it
> anymore.
>
>> Whatever we do, we'll need this PTI-disable flag to be able cross
>> exeve() so that a wrapper a la nice(1) work.
>
> Absolutely!
>
>> Initially, I think the
>> default should be that it survives fork().  There are just too many
>> things out there that "start up" by doing a shell script that calls a
>> python script, that calls a...
>
> Not only that, simply daemons, like most services are!
>
>> Without the wrapper support, we're _basically_ stuck using this only in
>> newly-compiled binaries.  That's going to make it much less likely to
>> get used.
>
> I know, that's why I kept considering that option despite not really
> needing it for my own use case.
>
>> The inheritance also gives an app a way to re-enable protections for
>> children, just from a _second_ wrapper.  That's nice because it means we
>> don't initially need a "NEXT" ABI.
>>
>> So, I'd do this:
>> 1. Do the arch_prctl() (but ask the ARM guys what they want too)
>> 2. Enabled for an entire process (not thread)
>> 3. Inherited across fork/exec
>> 4. Cleared on setuid() and friends
>
> This one causes me a problem : some daemons already take care of dropping
> privileges after the initial fork() for the sake of security. Haproxy
> typically does this at boot :
>
>    - parse config
>    - chroot to /var/empty
>    - setuid(dedicated_uid)
>    - fork()
>
> This ensures the process is properly isolated and hard enough to break out
> of. So I'd really like this setuid() not to anihilate all we've done.
> Probably that we want to drop it on suid binaries however, though I'm
> having doubts about the benefits, because if the binary already allows
> an intruder to inject its own meltdown code, you're quite screwed anyway.
>
>> 5. I'm sure the security folks have/want a way to force it on forever
>
> Sure! That's what I implemented using the sysctl.
>

All of these proposals have serious issues.  For example, suppose I
have a setuid program called nopti that works like this:

$ nopti some_program

nopti verifies that some_program is trustworthy and runs it (as the
real uid of nopti's user) with PTI off.  Now we have all the usual
problems: you can easily break out using ptrace(), for example.  And
LD_PRELOAD gets this wrong.  Et.

So I think that no-pti mode is a privilege as opposed to a mode per
se.  If you can turn off PTI, then you have the ability to read all of
kernel memory  So maybe we should treat it as such.  Add a capability
CAP_DISABLE_PTI.  If you have that capability (globally), then you can
use the arch_prctl() or regular prctl() or whatever to turn PTI on.
If you lose the cap, you lose no-pti mode as well.  If an LSM wants to
block it, it can use existing mechanisms.

As for per-mm vs per-thread, let's make it only switchable in
single-threaded processes for now and inherited when threads are
created.  We can change that if and when demand for the ability to
change it shows up.

(Another reason for per-thread instead of per-mm: as a per-mm thing,
you can't set it up for your descendents using vfork(); prctl();
exec(), and the latter is how your average language runtime that
spawns subprocesses would want to do it.