linux-kernel - Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs w/ PF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180926204400.GA11446@linux.intel.com>
Date:   Wed, 26 Sep 2018 13:44:00 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Andy Lutomirski <luto@...capital.net>,
        Andy Lutomirski <luto@...nel.org>,
        Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>,
        X86 ML <x86@...nel.org>,
        Platform Driver <platform-driver-x86@...r.kernel.org>,
        nhorman@...hat.com, npmccallum@...hat.com,
        "Ayoun, Serge" <serge.ayoun@...el.com>, shay.katz-zamir@...el.com,
        linux-sgx@...r.kernel.org,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v14 09/19] x86/mm: x86/sgx: Signal SEGV_SGXERR for #PFs
 w/ PF_SGX

On Wed, Sep 26, 2018 at 01:16:59PM -0700, Dave Hansen wrote:
> On 09/26/2018 11:12 AM, Andy Lutomirski wrote:
> >> e omniscient.
> >>
> >> How about this?  With formatting changes since it's long-winded...
> >>
> >>        /*
> >>         * Access is blocked by the Enclave Page Cache Map (EPCM), i.e. the
> >>         * access is allowed by the PTE but not the EPCM.  This usually happens
> >>         * when the EPCM is yanked out from under us, e.g. by hardware after a
> >>         * suspend/resume cycle.  In any case, software, i.e. the kernel, can't
> >>         * fix the source of the fault as the EPCM can't be directly modified
> >>         * by software.  Handle the fault as an access error in order to signal
> >>         * userspace, e.g. so that userspace can rebuild their enclave(s), even
> >>         * though userspace may not have actually violated access permissions.
> >>         */
> >>
> > Looks good to me.
> 
> Including the actual architectural definition of the bit might add some
> clarity.  The SDM explicitly says (Vol 3a section 4.7):
> 
> 	The fault resulted from violation of SGX-specific access-control
> 	requirements.
> 
> Which totally squares with returning true from access_error().
> 
> There's also a tidbit that says:
> 
> 	This flag is 1 if the exception is unrelated to paging and
> 	resulted from violation of SGX-specific access-control
> 	requirements. ... such a violation can occur only if there
> 	is no ordinary page fault...
> 
> This is pretty important.  It means that *none* of the other
> paging-related stuff that we're doing applies.
>
> We also need to clarify how this can happen.  Is it through something
> than an app does, or is it solely when the hardware does something under
> the covers, like suspend/resume.

Are you looking for something in the changelog, the comment, or just
a response?  If it's the latter...

On bare metal with a bug-free kernel, the only scenario I'm aware of
where we'll encounter these faults is when hardware pulls the rug out
from under us.  In a virtualized environment all bets are off because
the architecture allows VMMs to silently "destroy" the EPC at will,
e.g. KVM, and I believe Hyper-V, will take advantage of this behavior
to support live migration.  Post migration, the destination system
will generate PF_SGX because the EPC{M} can't be migrated between
system, i.e. the destination EPCM sees all EPC pages as invalid.