[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXNvbhMnitCiydoSweEr92RaA2fKrFQjCdeg+--u-TeuA@mail.gmail.com>
Date: Wed, 5 Dec 2018 15:40:48 -0800
From: Andy Lutomirski <luto@...nel.org>
To: "Christopherson, Sean J" <sean.j.christopherson@...el.com>
Cc: Andrew Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
X86 ML <x86@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
"H. Peter Anvin" <hpa@...or.com>,
LKML <linux-kernel@...r.kernel.org>,
Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>,
Josh Triplett <josh@...htriplett.org>
Subject: Re: [RFC PATCH 4/4] x86/vdso: Add __vdso_sgx_eenter() to wrap SGX
enclave transitions
On Wed, Dec 5, 2018 at 3:20 PM Sean Christopherson
<sean.j.christopherson@...el.com> wrote:
>
> Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only
> enclave mode that runs as a sort of black box shared object that is
> hosted by an untrusted normal CPL3 process.
>
> Enclave transitions have semantics that are a lovely blend of SYCSALL,
> SYSRET and VM-Exit. In a non-faulting scenario, entering and exiting
> an enclave can only be done through SGX-specific instructions, EENTER
> and EEXIT respectively. EENTER+EEXIT is analogous to SYSCALL+SYSRET,
> e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load
> RIP from R{B,C}X.
>
> But in a faulting/interrupting scenario, enclave transitions act more
> like VM-Exit and VMRESUME. Maintaining the black box nature of the
> enclave means that hardware must automatically switch CPU context when
> an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt
> or exception (exceptions are AEEs because asynchronous in this context
> is relative to the enclave and not CPU execution, e.g. the enclave
> doesn't get an opportunity to save/fuzz CPU state).
>
> Like VM-Exits, all AEEs jump to a common location, referred to as the
> Asynchronous Exiting Point (AEP). The AEP is specified at enclave entry
> via register passed to EENTER/ERESUME, similar to how the hypervisor
> specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME).
> Resuming the enclave/VM after the exiting event is handled is done via
> ERESUME/VMRESUME respectively. In SGX, AEEs that are handled by the
> kernel, e.g. INTR, NMI and most page faults, IRET will journey back to
> the AEP which then ERESUMEs th enclave.
>
> Enclaves also behave a bit like VMs in the sense that they can generate
> exceptions as part of their normal operation that for all intents and
> purposes need to handled in the enclave/VM. However, unlike VMX, SGX
> doesn't allow the host to modify its guest's, a.k.a. enclave's, state,
> as doing so would circumvent the enclave's security. So to handle an
> exception, the enclave must first be re-entered through the normal
> EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME
> (VMRESUME behavior) after the source of the exception is resolved.
>
> All of the above is just the tip of the iceberg when it comes to running
> an enclave. But, SGX was designed in such a way that the host process
> can utilize a library to build, launch and run an enclave. This is
> roughly analogous to how e.g. libc implementations are used by most
> applications so that the application can focus on its business logic.
>
> The big gotcha is that because enclaves can generate *and* handle
> exceptions, any SGX library must be prepared to handle nearly any
> exception at any time (well, any time a thread is executing in an
> enclave). In Linux, this means the SGX library must register a
> signal handler in order to intercept relevant exceptions and forward
> them to the enclave (or in some cases, take action on behalf of the
> enclave). Unfortunately, Linux's signal mechanism doesn't mesh well
> with libraries, e.g. signal handlers are process wide, are difficult
> to chain, etc... This becomes particularly nasty when using multiple
> levels of libraries that register signal handlers, e.g. running an
> enclave via cgo inside of the Go runtime.
>
> In comes vDSO to save the day. Now that vDSO can fixup exceptions,
> add a function to wrap enclave transitions and intercept any exceptions
> that occur in the enclave or on EENTER/ERESUME. The actually code is
> blissfully short (especially compared to this changelog).
>
> In addition to the obvious trapnr, error_code and address, propagate
> the leaf number, i.e. RAX, back to userspace so that the caller can know
> whether the fault occurred in the enclave or if it occurred on EENTER.
> A fault on EENTER generally means the enclave has died and needs to be
> restarted.
>
> Suggested-by: Andy Lutomirski <luto@...capital.net>
> Cc: Andy Lutomirski <luto@...capital.net>
> Cc: Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>
> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> Cc: Josh Triplett <josh@...htriplett.org>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@...el.com>
> ---
> arch/x86/entry/vdso/Makefile | 1 +
> arch/x86/entry/vdso/vdso.lds.S | 1 +
> arch/x86/entry/vdso/vsgx_eenter.c | 108 ++++++++++++++++++++++++++++++
> 3 files changed, 110 insertions(+)
> create mode 100644 arch/x86/entry/vdso/vsgx_eenter.c
>
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index eb543ee1bcec..ba46673076bd 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -18,6 +18,7 @@ VDSO32-$(CONFIG_IA32_EMULATION) := y
>
> # files to link into the vdso
> vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
> +vobjs-$(VDSO64-y) += vsgx_eenter.o
>
> # files to link into kernel
> obj-y += vma.o extable.o
> diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S
> index d3a2dce4cfa9..e422c4454f34 100644
> --- a/arch/x86/entry/vdso/vdso.lds.S
> +++ b/arch/x86/entry/vdso/vdso.lds.S
> @@ -25,6 +25,7 @@ VERSION {
> __vdso_getcpu;
> time;
> __vdso_time;
> + __vdso_sgx_eenter;
> local: *;
> };
> }
> diff --git a/arch/x86/entry/vdso/vsgx_eenter.c b/arch/x86/entry/vdso/vsgx_eenter.c
> new file mode 100644
> index 000000000000..3df4a95a34cc
> --- /dev/null
> +++ b/arch/x86/entry/vdso/vsgx_eenter.c
> @@ -0,0 +1,108 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
> +// Copyright(c) 2018 Intel Corporation.
> +
> +#include <uapi/linux/errno.h>
> +#include <uapi/linux/types.h>
> +
> +#include "extable.h"
> +
> +/*
> + * This struct will be defined elsewhere in the actual implementation,
> + * e.g. arch/x86/include/uapi/asm/sgx.h.
> + */
> +struct sgx_eenter_fault_info {
> + __u32 leaf;
> + __u16 trapnr;
> + __u16 error_code;
> + __u64 address;
> +};
> +
> +/*
> + * ENCLU (ENCLave User) is an umbrella instruction for a variety of CPL3
> + * SGX functions, The ENCLU function that is executed is specified in EAX,
> + * with each function potentially having more leaf-specific operands beyond
> + * EAX. In the vDSO we're only concerned with the leafs that are used to
> + * transition to/from the enclave.
> + */
> +enum sgx_enclu_leaves {
> + SGX_EENTER = 2,
> + SGX_ERESUME = 3,
> + SGX_EEXIT = 4,
> +};
> +
> +notrace long __vdso_sgx_eenter(void *tcs, void *priv,
> + struct sgx_eenter_fault_info *fault_info)
> +{
> + u32 trapnr, error_code;
> + long leaf;
> + u64 addr;
> +
> + /*
> + * %eax = EENTER
> + * %rbx = tcs
> + * %rcx = do_eresume
> + * %rdi = priv
> + * do_eenter:
> + * enclu
> + * jmp out
> + *
> + * do_eresume:
> + * enclu
> + * ud2
Is the only reason for do_eresume to be different from do_eenter so
that you can do the ud2?
> + *
> + * out:
> + * <return to C code>
> + *
> + * fault_fixup:
> + * <extable loads RDI, DSI and RDX with fault info>
> + * jmp out
> + */
This has the IMO excellent property that it's extremely awkward to use
it for a model where the enclave is reentrant. I think it's excellent
because reentrancy on the same enclave thread is just asking for
severe bugs. Of course, I fully expect the SDK to emulate reentrancy,
but then it's 100% their problem :) On the fiip side, it means that
you can't really recover from a reported fault, even if you want to,
because there's no way to ask for ERESUME. So maybe the API should
allow that after all.
I think it might be polite to at least give some out regs, maybe RSI and RDI?
Powered by blists - more mailing lists