linux-kernel - Re: [PATCH v38 10/24] mm: Add vm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200924162750.GC22819@linux.intel.com>
Date:   Thu, 24 Sep 2020 09:27:51 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>,
        Andy Lutomirski <luto@...capital.net>,
        Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
        linux-sgx@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Jethro Beekman <jethro@...tanix.com>,
        Darren Kenny <darren.kenny@...cle.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        asapek@...gle.com, Borislav Petkov <bp@...en8.de>,
        "Xing, Cedric" <cedric.xing@...el.com>, chenalexchen@...gle.com,
        Conrad Parker <conradparker@...gle.com>, cyhanish@...gle.com,
        "Huang, Haitao" <haitao.huang@...el.com>,
        Josh Triplett <josh@...htriplett.org>,
        "Huang, Kai" <kai.huang@...el.com>,
        "Svahn, Kai" <kai.svahn@...el.com>, Keith Moyer <kmoy@...gle.com>,
        Christian Ludloff <ludloff@...gle.com>,
        Neil Horman <nhorman@...hat.com>,
        Nathaniel McCallum <npmccallum@...hat.com>,
        Patrick Uiterwijk <puiterwijk@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>, yaozhangx@...gle.com
Subject: Re: [PATCH v38 10/24] mm: Add vm_ops->mprotect()

On Thu, Sep 24, 2020 at 07:50:15AM -0700, Dave Hansen wrote:
> On 9/23/20 7:33 AM, Jarkko Sakkinen wrote:
> > The consequence is that enclaves are best created with an ioctl API and the
> > access control can be based only to the origin of the source file for the
> > enclave data, i.e. on VMA file pointer and page permissions. For example,
> > this could be done with LSM hooks that are triggered in the appropriate
> > ioctl's and they could make the access control decision based on this
> > information.
> > 
> > Unfortunately, there is ENCLS[EMODPE] that a running enclave can use to
> > upgrade its permissions. If we do not limit mmap() and mprotect(), enclave
> > could upgrade its permissions by using EMODPE followed by an appropriate
> > mprotect() call. This would be completely hidden from the kernel.
> > 
> > Add 'mprotect' hook to vm_ops, so that a callback can be implemeted for SGX
> > that will ensure that {mmap, mprotect}() permissions do not surpass any of
> > the original page permissions. This feature allows to maintain and refine
> > sane access control for enclaves.
> 
> Maybe I'm just being dense, but I still don't have a clear idea what
> function this hook serves.
> 
> I understand that SGX has an orthogonal set of page permissions to the
> normal x86 page tables.  It needs these so that the OS can't play nasty
> tricks on the enclave, like removing read-only protections that provide
> hardening.
> 
> But, I still don't get the connection to mprotect() and the x86 paging
> permissions.  If the enclave's permissions are orthogonal, then why
> bother with this hook?  Why does the OS view of the enclave's memory matter?

For the purpose of this discussion, ignore the enclave permissions.  The only
reason they're mentioned is to explain the background (well, try to) of how we
ended up at an ->mprotect() hook.  There was a great deal of discussion in the
past about whether or not we could use enclave permissions to enforce OS
permissions.  The TL:DR version is that because of ENCLU[EMODPE], the answer
is "no".

What we're preventing via ->mprotect() is using SGX to bypass existing
restrictions on code execution, e.g. noexec and LSM policies.

Because code must first be loaded into an enclave before it can be executed,
all enclaves are kind of a variant on (in SELinux terminology) either EXECMOD
or EXECMEM.  I.e. it's simply not possible to execute an enclave by mapping
the source file as executable.  This effectively allows userspace to bypass a
noexec FS by loading code into an enclave without EXEC perms on the source
file, only on /dev/sgx/enclave, and denying EXEC on /dev/sgx/enclave would
prevent running _any_ enclave.

The ->mprotect() hook is used by SGX to require userspace to declare what
permissions are allowed on any given enclave page, e.g. SGX's mmap()/mprotect()
requires all underlying enclave pages to be declared as executable if the
mmap()/mprotect() is specifying VM_EXEC.  By requiring userspace to declare
their intent up front, SGX can then enforce noexec by requiring pages that are
declared as executable to have VM_MAYEXEC set in the source VMA when loading
code into the enclave.

As Jarkko pointed out, an alternative to adding ->mprotect() would be to
simply require VM_MAYEXEC on _all_ source VMAs when loading code into the
enclave.  That would work, albeit with the potentially undesirable side effect
of preventing loading any part of an enclave from e.g. a noexec, readonly FS.

But, unconditionally requiring VM_MAYEXEC doesn't address the Linux Security
Module hooks for mmap() and mprotect(), which could also be bypassed by abusing
SGX.  E.g. a process could gain arbitrary code execution by loading code from
anonymous memory into an enclave, as the LSM checks hooks at mmap()/mprotect()
will see always vm_file=/dev/sgx/enclave.  An LSM could deny a process access
to /dev/sgx/enclave, but again that is very coarse granularity.

By requiring userspace to declare permissions up front (when loading code/data
into an enclave), SGX can make explicit upcalls to LSMs hooks at load time so
that an LSM can enforce a meaningful policy, e.g. require all enclave code to
originate from an executable file system.  This series doesn't actually
implement the LSM integration, but it does ensure that _if_ we want to add LSM
support in the future, we can do so without breaking userspace.