linux-kernel - Re: [PATCH v38 10/24] mm: Add vm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c29a662e-90ff-4862-8c82-06b43b81fb4d@intel.com>
Date:   Mon, 28 Sep 2020 09:48:10 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>
Cc:     Sean Christopherson <sean.j.christopherson@...el.com>,
        Haitao Huang <haitao.huang@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
        linux-sgx@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Jethro Beekman <jethro@...tanix.com>,
        Darren Kenny <darren.kenny@...cle.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        asapek@...gle.com, Borislav Petkov <bp@...en8.de>,
        "Xing, Cedric" <cedric.xing@...el.com>, chenalexchen@...gle.com,
        Conrad Parker <conradparker@...gle.com>, cyhanish@...gle.com,
        "Huang, Haitao" <haitao.huang@...el.com>,
        Josh Triplett <josh@...htriplett.org>,
        "Huang, Kai" <kai.huang@...el.com>,
        "Svahn, Kai" <kai.svahn@...el.com>, Keith Moyer <kmoy@...gle.com>,
        Christian Ludloff <ludloff@...gle.com>,
        Neil Horman <nhorman@...hat.com>,
        Nathaniel McCallum <npmccallum@...hat.com>,
        Patrick Uiterwijk <puiterwijk@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>, yaozhangx@...gle.com
Subject: Re: [PATCH v38 10/24] mm: Add vm_ops->mprotect()

On 9/28/20 9:19 AM, Jarkko Sakkinen wrote:
> On Mon, Sep 28, 2020 at 07:04:38AM -0700, Dave Hansen wrote:
>> EMODPE is virtually irrelevant for this whole thing.  The x86 PTE
>> permissions still specify the most restrictive permissions, which is
>> what matters the most.
>>
>> We care about the _worst_ the enclave can do, not what it imposes on
>> itself on top of that.
> 
> AFAIK it is not, or what we are protecting against with this anyway
> then?
> 
> Let say an LSM makes decision for the permissions based on origin. If we
> do not have this you can:
> 
> 1. EMODPE
> 2. mprotect

The thing that matters is that the enclave needs relaxed permissions
from the kernel.  What it *ALSO* needs to do to *ITSELF* to get those
permissions is entirely irrelevant to the kernel.

> I.e. whatever LSM decides, won't matter.
> 
> The other case, noexec, is now unconditionally denied.



>>> I think other important thing to underline is that an LSM or any other
>>> security measure can only do a sane decision when the enclave is loaded.
>>> At that point we know the source (vm_file).
>>
>> Right, you know the source, but it can be anonymous or a file.
> 
> They are both origin, the point being that you know what you're dealing
> with when you build the enclave, not when you map it.
> 
> This is my current rewrite of the commit message in my master branch:
> 
> "
>     mm: Add 'mprotect' callback to vm_ops
>     
>     Intel Sofware Guard eXtensions (SGX) allows creation of blobs called
>     enclaves, for which page permissions are defined when the enclave is first
>     loaded. Once an enclave is loaded and initialized, it can be mapped to the
>     process address space.
>     
>     There is no standard file format for enclaves. They are dynamically built
>     and the ways how enclaves are deployed differ greatly. For an app you might
>     want to have a simple static binary, but on the other hand for a container
>     you might want to dynamically create the whole thing at run-time. Also, the
>     existing ecosystem for SGX is already large, which would make the task very
>     hard.

I'm sorry I ever mentioned the file format.  Please remove any mention
of it.  It's irrelevant.  This entire paragraph is irrelevant.

>     Finally, even if there was a standard format, one would still want a
>     dynamic way to add pages to the enclave. One big reason for this is that
>     enclaves have load time defined pages that represent entry points to the
>     enclave. Each entry point can service one hardware thread at a time and
>     you might want to run-time parametrize this depending on your environment.

I also don't know what this paragraph has to do with the mprotect()
hook.  Please remove it.

>     The consequence is that enclaves are best created with an ioctl API and the
>     access control can be based only to the origin of the source file for the
>     enclave data, i.e. on VMA file pointer and page permissions. For example,

It's not strictly page permissions, though.  It's actually VMA
permissions.  The thing you copy from might be the zero page, and even
though it has Write=0 page permissions, apps are completely OK to write
to the address.  This is the WRITE vs. MAY_WRITE semantic in the VMA flags.

It's also not just about *files*.  Anonymous memory might or might not
be a valid source for enclave data based on LSM hooks.

>     this could be done with LSM hooks that are triggered in the appropriate
>     ioctl's and they could make the access control decision based on this
>     information.

This "appropriate ioctl's" is not good changelog material.  Please use
those bytes to convey actual information.

	... this could be done with LSM hooks which restrict the source
	of enclave page data

I don't care that it's an ioctl(), really.  What matters is what the
ioctl() does: copy data into enclave pages.

>     Unfortunately, there is ENCLS[EMODPE] that a running enclave can use to
>     upgrade its permissions. If we do not limit mmap() and mprotect(), enclave
>     could upgrade its permissions by using EMODPE followed by an appropriate
>     mprotect() call. This would be completely hidden from the kernel.

There's too much irrelevant info.

I'll say it again: all that matters is that enclaves can legitimately,
safely, and securely have a need for the kernel to change page
permissions.  That's *IT*.  EMODPE just happens to be part of the
mechanism that makes these permission changes safe for enclaves.  It's a
side show.

>     Add 'mprotect' hook to vm_ops, so that a callback can be implemeted for SGX
>     that will ensure that {mmap, mprotect}() permissions do not surpass any of
>     the original page permissions. This feature allows to maintain and refine
>     sane access control for enclaves.

Instead of "original", I'd stick to the "source" page nomenclature.
There are also "original" permissions with mprotect().

Also, it's literally OK for the enclave page permissions to surpass the
original (source) page permissions.  That sentence is incorrect, or at
least misleadingly imprecise.

> I'm mostly happy with this but am open for change suggestions.

I wrote a pretty nice description of this.  It was about 90% correct,
shorter, and conveyed more information.  I'd suggest starting with that.