[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1538404143.30715.27.camel@intel.com>
Date: Mon, 01 Oct 2018 07:29:03 -0700
From: Sean Christopherson <sean.j.christopherson@...el.com>
To: Andy Lutomirski <luto@...capital.net>,
Dave Hansen <dave.hansen@...el.com>
Cc: Andrew Lutomirski <luto@...nel.org>,
Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>,
X86 ML <x86@...nel.org>,
Platform Driver <platform-driver-x86@...r.kernel.org>,
nhorman@...hat.com, npmccallum@...hat.com,
"Ayoun, Serge" <serge.ayoun@...el.com>, shay.katz-zamir@...el.com,
linux-sgx@...r.kernel.org,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs
w/ PF_SGX
On Wed, 2018-09-26 at 14:15 -0700, Andy Lutomirski wrote:
> On Wed, Sep 26, 2018 at 1:55 PM Dave Hansen <dave.hansen@...el.com> wrote:
> >
> >
> > On 09/26/2018 01:44 PM, Sean Christopherson wrote:
> > >
> > > On Wed, Sep 26, 2018 at 01:16:59PM -0700, Dave Hansen wrote:
> > > >
> > > > We also need to clarify how this can happen. Is it through something
> > > > than an app does, or is it solely when the hardware does something under
> > > > the covers, like suspend/resume.
> > > Are you looking for something in the changelog, the comment, or just
> > > a response? If it's the latter...
> > Comments, please.
> >
> > >
> > > On bare metal with a bug-free kernel, the only scenario I'm aware of
> > > where we'll encounter these faults is when hardware pulls the rug out
> > > from under us. In a virtualized environment all bets are off because
> > > the architecture allows VMMs to silently "destroy" the EPC at will,
> > > e.g. KVM, and I believe Hyper-V, will take advantage of this behavior
> > > to support live migration. Post migration, the destination system
> > > will generate PF_SGX because the EPC{M} can't be migrated between
> > > system, i.e. the destination EPCM sees all EPC pages as invalid.
> > OK, cool.
> >
> > That's good background fodder for the changelog.
> >
> > But, for the comment, I'm happy with something like this:
> >
> > /*
> > * The fault resulted from violation of SGX-specific access-
> > * controls. This is expected to be the result of some lower
> > * layer action (CPU suspend/resume, VM migration) and is
> > * not related to anything the OS did. Treat it as an access
> > * error to ensure it is passed up to the app via a signal where
> > * it can be handled.
> > */
> >
> > I really don't think we need to delve too deeply into the relationship
> > between EPCM and PTEs or anything. Let's just say, "it's not the
> > kernel's fault, it's not the app's fault, so throw up our hands".
> There is a non-nitpicky consideration here. Logically, user code is
> going to do this (totally made-up pseudocode):
>
> enclave_t enclave = load_and_init_enclave(...);
> int ret = sgx_run(enclave, some pointers to non-enclave-memory buffers, ...);
>
> and, with the code in this patch, a correct implementation of
> sgx_run() requires installing a signal handler. This is nasty, since
> signal handlers, expecially for something like SIGSEGV or SIGBUS, are
> not fantastic to say the least in libraries.
>
> Could we perhaps have a little vDSO entry (or syscall, I suppose) that
> runs an enclave an returns an error code, and rig up the #PF handler
> to check if the error happened in the vDSO entry and fix it up rather
> than sending a signal?
If we want to avoid having to install a signal handler then I'm pretty
sure we'd need to fixup all #GPs and "bad access" #PFs that occur on
EENTER or in the enclave, not just PF_SGX faults. SGX1 hardware takes
a #GP instead of a #PF on EPCM faults, and SGX2 hardware allows enclaves
to allocate/free/adjust EPC pages at runtime, e.g. an enclave runtime
might want to intercept #PFs from within the enclave so that the enclave
can dynamically grow its stack.
> On Windows, this is much less of a concern, because Windows has real
> scoped fault handling. But Linux doesn't, at least not yet.
>
>
> --
> Andy Lutomirski
> AMA Capital Management, LLC
Powered by blists - more mailing lists