linux-kernel - Re: [PATCH v38 10/24] mm: Add vm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1B23E216-0229-4BDD-8B09-807256A54AF5@amacapital.net>
Date:   Fri, 18 Sep 2020 17:15:32 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Sean Christopherson <sean.j.christopherson@...el.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>,
        X86 ML <x86@...nel.org>, linux-sgx@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Jethro Beekman <jethro@...tanix.com>,
        Darren Kenny <darren.kenny@...cle.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        asapek@...gle.com, Borislav Petkov <bp@...en8.de>,
        "Xing, Cedric" <cedric.xing@...el.com>, chenalexchen@...gle.com,
        Conrad Parker <conradparker@...gle.com>, cyhanish@...gle.com,
        Dave Hansen <dave.hansen@...el.com>,
        "Huang, Haitao" <haitao.huang@...el.com>,
        Josh Triplett <josh@...htriplett.org>,
        "Huang, Kai" <kai.huang@...el.com>,
        "Svahn, Kai" <kai.svahn@...el.com>, Keith Moyer <kmoy@...gle.com>,
        Christian Ludloff <ludloff@...gle.com>,
        Neil Horman <nhorman@...hat.com>,
        Nathaniel McCallum <npmccallum@...hat.com>,
        Patrick Uiterwijk <puiterwijk@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>, yaozhangx@...gle.com
Subject: Re: [PATCH v38 10/24] mm: Add vm_ops->mprotect()


> On Sep 18, 2020, at 4:53 PM, Sean Christopherson <sean.j.christopherson@...el.com> wrote:
> 
> On Fri, Sep 18, 2020 at 08:09:04AM -0700, Andy Lutomirski wrote:
>>> On Tue, Sep 15, 2020 at 4:28 AM Jarkko Sakkinen
>>> <jarkko.sakkinen@...ux.intel.com> wrote:
>>> 
>>> From: Sean Christopherson <sean.j.christopherson@...el.com>
>>> 
>>> Add vm_ops()->mprotect() for additional constraints for a VMA.
>>> 
>>> Intel Software Guard eXtensions (SGX) will use this callback to add two
>>> constraints:
>>> 
>>> 1. Verify that the address range does not have holes: each page address
>>>   must be filled with an enclave page.
>>> 2. Verify that VMA permissions won't surpass the permissions of any enclave
>>>   page within the address range. Enclave cryptographically sealed
>>>   permissions for each page address that set the upper limit for possible
>>>   VMA permissions. Not respecting this can cause #GP's to be emitted.
> 
> Side note, #GP is wrong.  EPCM violations are #PFs.  Skylake CPUs #GP, but
> that's technically an errata.  But this isn't the real motivation, e.g.
> userspace can already trigger #GP/#PF by reading/writing a bad address, SGX
> simply adds another flavor.
> 
>> It's been awhile since I looked at this.  Can you remind us: is this
>> just preventing userspace from shooting itself in the foot or is this
>> something more important?
> 
> Something more important, it's used to prevent userspace from circumventing
> a noexec filesystem by loading code into an enclave, and to give the kernel the
> option of adding enclave specific LSM policies in the future.
> 
> The source file (if one exists) for the enclave is long gone when the enclave
> is actually mmap()'d and mprotect()'d.  To enforce noexec, the requested
> permissions for a given page are snapshotted when the page is added to the
> enclave, i.e. when the enclave is built.  Enclave pages that will be executable
> must originate from an a MAYEXEC VMA, e.g. the source page can't come from a
> noexec file system.
> 
> The ->mprotect() hook allows SGX to reject mprotect() if userspace is declaring
> permissions beyond what are allowed, e.g. trying to map an enclave page with
> EXEC permissions when the page was added to the enclave without EXEC.
> 
> Future LSM policies have a similar need due to vm_file always pointing at
> /dev/sgx/enclave, e.g. policies couldn't be attached to a specific enclave.
> ->mprotect() again allows enforcing permissions at map time that were checked
> at enclave build time, e.g. via an LSM hook.
> 
> Deferring ->mprotect() until LSM support is added (if it ever is) would be
> problematic due to SGX2.  With SGX2, userspace can extend permissions of an
> enclave page (for the CPU's EPC Map entry, not the kernel's page tables)
> without bouncing through the kernel.  Without ->mprotect () enforcement.
> userspace could do EADD(RW) -> mprotect(RWX) -> EMODPE(X) to gain W+X.  We
> want to disallow such a flow now, i.e. force userspace to do EADD(RW,X), so
> that the hypothetical LSM hook would have all information at EADD(), i.e.
> would be aware of the EXEC permission, without creating divergent behavior
> based on whether or not an LSM is active.

That’s what I thought. Can we get this in the changelog?