linux-kernel - Re: RFC: userspace exception fixups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1541541565.8854.13.camel@intel.com>
Date:   Tue, 06 Nov 2018 13:59:25 -0800
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...el.com>
Cc:     Jann Horn <jannh@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Rich Felker <dalias@...c.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Jethro Beekman <jethro@...tanix.com>,
        Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>,
        Florian Weimer <fweimer@...hat.com>,
        Linux API <linux-api@...r.kernel.org>, X86 ML <x86@...nel.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>, nhorman@...hat.com,
        npmccallum@...hat.com, "Ayoun, Serge" <serge.ayoun@...el.com>,
        shay.katz-zamir@...el.com, linux-sgx@...r.kernel.org,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Carlos O'Donell <carlos@...hat.com>,
        adhemerval.zanella@...aro.org
Subject: Re: RFC: userspace exception fixups

On Tue, 2018-11-06 at 13:41 -0800, Andy Lutomirski wrote:
> On Tue, Nov 6, 2018 at 1:07 PM Andy Lutomirski <luto@...capital.net> wrote:
> > 
> > > 
> > > On Nov 6, 2018, at 1:00 PM, Dave Hansen <dave.hansen@...el.com> wrote:
> > > 
> > > > 
> > > > On 11/6/18 12:12 PM, Andy Lutomirski wrote:
> > > > True, but what if we have a nasty enclave that writes to memory just
> > > > below SP *before* decrementing SP?
> > > Yeah, that would be unfortunate.  If an enclave did this (roughly):
> > > 
> > >    1. EENTER
> > >    2. Hardware sets eenter_hwframe->sp = %sp
> > >    3. Enclave runs... wants to do out-call
> > >    4. Enclave sets up parameters:
> > >        memcpy(&eenter_hwframe->sp[-offset], arg1, size);
> > >        ...
> > >    5. Enclave sets eenter_hwframe->sp -= offset
> > > 
> > > If we got a signal between 4 and 5, we'd clobber the copy of 'arg1' that
> > > was on the stack.  The enclave could easily fix this by moving ->sp first.
> > > 
> > > But, this is one of those "fun" parts of the ABI that I think we need to
> > > talk about.  If we do this, we also basically require that the code
> > > which handles asynchronous exits must *not* write to the stack.  That's
> > > not hard because it's typically just a single ERESUME instruction, but
> > > it *is* a requirement.
> > > 
> > I was assuming that the async exit stuff was completely hidden by the API. The AEP code would decide whether the exit got fixed up by the kernel (which may or may not be easy to tell — can the
> > code even tell without kernel help whether it was, say, an IRQ vs #UD?) and then either do ERESUME or cause sgx_enter_enclave() to return with an appropriate return value.
> > 
> > 
> Sean, how does the current SDK AEX handler decide whether to do
> EENTER, ERESUME, or just bail and consider the enclave dead?  It seems
> like the *CPU* could give a big hint, but I don't see where there is
> any architectural indication of why the AEX code got called or any
> obvious way for the user code to know whether the exit was fixed up by
> the kernel?

The SDK "unconditionally" does ERESUME at the AEP location, but that's
bit misleading because its signal handler may muck with the context's
RIP, e.g. to abort the enclave on a fatal fault.

On an event/exception from within an enclave, the event is immediately
delivered after loading synthetic state and changing RIP to the AEP.
In other words, jamming CPU state is essentially a bunch of vectoring
ucode preamble, but from software's perspective it's a normal event
that happens to point at the AEP instead of somewhere in the enclave.
And because the signals the SDK cares about are all synchronous, the
SDK can simply hardcode ERESUME at the AEP since all of the fault logic
resides in its signal handler.  IRQs and whatnot simply trampoline back
into the enclave.

Userspace can do something funky instead of ERESUME, but only *after*
IRET/RSM/VMRESUME has returned to the AEP location, and in Linux's
case, after the trap handler has run.

Jumping back a bit, how much do we care about preventing userspace
from doing stupid things?  I did a quick POC on the idea of hardcoding
fixup for the ENCLU opcode, and the basic idea checks out.  The code
is fairly minimal and doesn't impact the core functionality of the SDK.
They'd need to redo their trap handling to move it from the signal
handler to inline, but their stack shenanigans won't be any more broken
than they already are.