linux-kernel - Re: RFC: userspace exception fixups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181107153452.GB22972@linux.intel.com>
Date:   Wed, 7 Nov 2018 07:34:52 -0800
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Dave Hansen <dave.hansen@...el.com>, Jann Horn <jannh@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Rich Felker <dalias@...c.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Jethro Beekman <jethro@...tanix.com>,
        Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>,
        Florian Weimer <fweimer@...hat.com>,
        Linux API <linux-api@...r.kernel.org>, X86 ML <x86@...nel.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>, nhorman@...hat.com,
        npmccallum@...hat.com, "Ayoun, Serge" <serge.ayoun@...el.com>,
        shay.katz-zamir@...el.com, linux-sgx@...r.kernel.org,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Carlos O'Donell <carlos@...hat.com>,
        adhemerval.zanella@...aro.org
Subject: Re: RFC: userspace exception fixups

On Tue, Nov 06, 2018 at 05:17:14PM -0800, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 4:02 PM Sean Christopherson
> <sean.j.christopherson@...el.com> wrote:
> >
> > On Tue, Nov 06, 2018 at 03:39:48PM -0800, Andy Lutomirski wrote:
> > > On Tue, Nov 6, 2018 at 3:35 PM Sean Christopherson
> > > <sean.j.christopherson@...el.com> wrote:
> > > >
> > > > On Tue, Nov 06, 2018 at 03:00:56PM -0800, Andy Lutomirski wrote:
> > > > >
> > > > >
> > > > > >> On Nov 6, 2018, at 1:59 PM, Sean Christopherson <sean.j.christopherson@...el.com> wrote:
> > > > > >>
> > > > > >>> On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> > > > > >> Sean, how does the current SDK AEX handler decide whether to do
> > > > > >> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
> > > > > >> like the *CPU* could give a big hint, but I don't see where there is
> > > > > >> any architectural indication of why the AEX code got called or any
> > > > > >> obvious way for the user code to know whether the exit was fixed up by
> > > > > >> the kernel?
> > > > > >
> > > > > > The SDK "unconditionally" does ERESUME at the AEP location, but that's
> > > > > > bit misleading because its signal handler may muck with the context's
> > > > > > RIP, e.g. to abort the enclave on a fatal fault.
> > > > > >
> > > > > > On an event/exception from within an enclave, the event is immediately
> > > > > > delivered after loading synthetic state and changing RIP to the AEP.
> > > > > > In other words, jamming CPU state is essentially a bunch of vectoring
> > > > > > ucode preamble, but from software's perspective it's a normal event
> > > > > > that happens to point at the AEP instead of somewhere in the enclave.
> > > > > > And because the signals the SDK cares about are all synchronous, the
> > > > > > SDK can simply hardcode ERESUME at the AEP since all of the fault logic
> > > > > > resides in its signal handler.  IRQs and whatnot simply trampoline back
> > > > > > into the enclave.
> > > > > >
> > > > > > Userspace can do something funky instead of ERESUME, but only *after*
> > > > > > IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
> > > > > > case, after the trap handler has run.
> > > > > >
> > > > > > Jumping back a bit, how much do we care about preventing userspace
> > > > > > from doing stupid things?
> > > > >
> > > > > My general feeling is that userspace should be allowed to do apparently
> > > > > stupid things. For example, as far as the kernel is concerned, Wine and
> > > > > DOSEMU are just user programs that do stupid things. Linux generally tries
> > > > > to provide a reasonably complete view of architectural behavior. This is
> > > > > in contrast to, say, Windows, where IIUC doing an unapproved WRFSBASE May
> > > > > cause very odd behavior indeed. So magic fixups that do non-architectural
> > > > > things are not so great.
> > > >
> > > > Sorry if I'm beating a dead horse, but what if we only did fixup on ENCLU
> > > > with a specific (ignored) prefix pattern?  I.e. effectively make the magic
> > > > fixup opt-in, falling back to signals.  Jamming RIP to skip ENCLU isn't
> > > > that far off the architecture, e.g. EENTER stuffs RCX with the next RIP so
> > > > that the enclave can EEXIT to immediately after the EENTER location.
> > > >
> > >
> > > How does that even work, though?  On an AEX, RIP points to the ERESUME
> > > instruction, not the EENTER instruction, so if we skip it we just end
> > > up in lala land.
> >
> > Userspace would obviously need to be aware of the fixup behavior, but
> > it actually works out fairly nicely to have a separate path for ERESUME
> > fixup since a fault on EENTER is generally fatal, whereas as a fault on
> > ERESUME might be recoverable.
> >
> 
> Hmm.
> 
> >
> > do_eenter:
> >     mov     tcs, %rbx
> >     lea     async_exit, %rcx
> >     mov     $EENTER, %rax
> >     ENCLU
> 
> Or SOME_SILLY_PREFIX ENCLU?

Yeah, forgot to include that.

> >
> > /*
> >  * EEXIT or EENTER faulted.  In the latter case, %RAX already holds some
> >  * fault indicator, e.g. -EFAULT.
> >  */
> > eexit_or_eenter_fault:
> >     ret
> 
> But userspace wants to know whether it was a fault or not.  So I think
> we either need two landing pads or we need to hijack a flag bit (are
> there any known-zeroed flag bits after EEXIT?) to say whether it was a
> fault.  And, if it was a fault, we should give the vector, the
> sanitized error code, and possibly CR2.

As Jethro mentioned, RAX will always be 4 on a successful EEXIT, so we
can use RAX to indicate a fault.  That's what I was trying to imply with
EFAULT.  Here's the reg stuffing I use for the POC:

	regs->ax = EFAULT;
	regs->di = trapnr;
	regs->si = error_code;
	regs->dx = address;


Well-known RAX values also means the kernel fault handlers only need to
look for SOME_SILLY_PREFIX ENCLU if RAX==2 || RAX==3, i.e. the fault
occurred on EENTER or in an enclave (RAX is set to ERESUME's leaf as
part of the asynchronous enlcave exit flow).

> >
> > async_exit:
> >     ENCLU
> 
> Same prefix here, right?
> 
> >
> > fixup_handler:
> >     <do fault stuff>
> 
> This whole thing is a bit odd, but not necessarily a terrible idea.