linux-kernel - Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs w/ PF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrXByb2UVuZ6AXUeOd8y90NAikbZuvdN3wf_TjHZ+CxNhA@mail.gmail.com>
Date:   Wed, 26 Sep 2018 14:15:31 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     "Christopherson, Sean J" <sean.j.christopherson@...el.com>,
        Andrew Lutomirski <luto@...nel.org>,
        Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>,
        X86 ML <x86@...nel.org>,
        Platform Driver <platform-driver-x86@...r.kernel.org>,
        nhorman@...hat.com, npmccallum@...hat.com,
        "Ayoun, Serge" <serge.ayoun@...el.com>, shay.katz-zamir@...el.com,
        linux-sgx@...r.kernel.org,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs w/ PF_SGX

On Wed, Sep 26, 2018 at 1:55 PM Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 09/26/2018 01:44 PM, Sean Christopherson wrote:
> > On Wed, Sep 26, 2018 at 01:16:59PM -0700, Dave Hansen wrote:
> >> We also need to clarify how this can happen.  Is it through something
> >> than an app does, or is it solely when the hardware does something under
> >> the covers, like suspend/resume.
> >
> > Are you looking for something in the changelog, the comment, or just
> > a response?  If it's the latter...
>
> Comments, please.
>
> > On bare metal with a bug-free kernel, the only scenario I'm aware of
> > where we'll encounter these faults is when hardware pulls the rug out
> > from under us.  In a virtualized environment all bets are off because
> > the architecture allows VMMs to silently "destroy" the EPC at will,
> > e.g. KVM, and I believe Hyper-V, will take advantage of this behavior
> > to support live migration.  Post migration, the destination system
> > will generate PF_SGX because the EPC{M} can't be migrated between
> > system, i.e. the destination EPCM sees all EPC pages as invalid.
>
> OK, cool.
>
> That's good background fodder for the changelog.
>
> But, for the comment, I'm happy with something like this:
>
>         /*
>          * The fault resulted from violation of SGX-specific access-
>          * controls.  This is expected to be the result of some lower
>          * layer action (CPU suspend/resume, VM migration) and is
>          * not related to anything the OS did.  Treat it as an access
>          * error to ensure it is passed up to the app via a signal where
>          * it can be handled.
>          */
>
> I really don't think we need to delve too deeply into the relationship
> between EPCM and PTEs or anything.  Let's just say, "it's not the
> kernel's fault, it's not the app's fault, so throw up our hands".

There is a non-nitpicky consideration here.  Logically, user code is
going to do this (totally made-up pseudocode):

enclave_t enclave = load_and_init_enclave(...);
int ret = sgx_run(enclave, some pointers to non-enclave-memory buffers, ...);

and, with the code in this patch, a correct implementation of
sgx_run() requires installing a signal handler.  This is nasty, since
signal handlers, expecially for something like SIGSEGV or SIGBUS, are
not fantastic to say the least in libraries.

Could we perhaps have a little vDSO entry (or syscall, I suppose) that
runs an enclave an returns an error code, and rig up the #PF handler
to check if the error happened in the vDSO entry and fix it up rather
than sending a signal?

On Windows, this is much less of a concern, because Windows has real
scoped fault handling. But Linux doesn't, at least not yet.


--
Andy Lutomirski
AMA Capital Management, LLC