linux-kernel - Re: [RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190603171549.GE13384@linux.intel.com>
Date:   Mon, 3 Jun 2019 10:15:49 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     "Xing, Cedric" <cedric.xing@...el.com>
Cc:     Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Stephen Smalley <sds@...ho.nsa.gov>,
        James Morris <jmorris@...ei.org>,
        "Serge E . Hallyn" <serge@...lyn.com>,
        LSM List <linux-security-module@...r.kernel.org>,
        Paul Moore <paul@...l-moore.com>,
        Eric Paris <eparis@...isplace.org>,
        "selinux@...r.kernel.org" <selinux@...r.kernel.org>,
        Jethro Beekman <jethro@...tanix.com>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        "linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "nhorman@...hat.com" <nhorman@...hat.com>,
        "npmccallum@...hat.com" <npmccallum@...hat.com>,
        "Ayoun, Serge" <serge.ayoun@...el.com>,
        "Katz-zamir, Shay" <shay.katz-zamir@...el.com>,
        "Huang, Haitao" <haitao.huang@...el.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        "Svahn, Kai" <kai.svahn@...el.com>, Borislav Petkov <bp@...en8.de>,
        Josh Triplett <josh@...htriplett.org>,
        "Huang, Kai" <kai.huang@...el.com>,
        David Rientjes <rientjes@...gle.com>,
        "Roberts, William C" <william.c.roberts@...el.com>,
        "Tricca, Philip B" <philip.b.tricca@...el.com>
Subject: Re: [RFC PATCH 0/9] security: x86/sgx: SGX vs. LSM

On Sun, Jun 02, 2019 at 12:29:35AM -0700, Xing, Cedric wrote:
> Hi Sean,
> 
> > From: Christopherson, Sean J
> > Sent: Friday, May 31, 2019 4:32 PM
> > 
> > This series is the result of a rather absurd amount of discussion over how to get SGX to play
> > nice with LSM policies, without having to resort to evil shenanigans or put undue burden on
> > userspace.  The discussion definitely wandered into completely insane territory at times, but
> > I think/hope we ended up with something reasonable.
> > 
> > The basic gist of the approach is to require userspace to declare what protections are
> > maximally allowed for any given page, e.g. add a flags field for loading enclave pages that
> > takes ALLOW_{READ,WRITE,EXEC}.  LSMs can then adjust the allowed protections, e.g. clear
> > ALLOW_EXEC to prevent ever mapping the page with PROT_EXEC.  SGX enforces the allowed perms
> > via a new mprotect() vm_ops hook, e.g. like regular mprotect() uses MAY_{READ,WRITE,EXEC}.
> > 
> > ALLOW_EXEC is used to deny hings like loading an enclave from a noexec file system or from a
> > file without EXECUTE permissions, e.g. without the ALLOW_EXEC concept, on SGX2 hardware
> > (regardless of kernel support) userspace could EADD from a noexec file using read-only
> > permissions, and later use mprotect() and ENCLU[EMODPE] to gain execute permissions.
> > 
> > ALLOW_WRITE is used in conjuction with ALLOW_EXEC to enforce SELinux's EXECMOD (or EXECMEM).
> > 
> > This is very much an RFC series.  It's only compile tested, likely has obvious bugs, the
> > SELinux patch could be completely harebrained, etc...
> > My goal at this point is to get feedback at a macro level, e.g. is the core concept
> > viable/acceptable, are there objection to hooking mprotect(), etc...
> > 
> > Andy and Cedric, hopefully this aligns with your general expectations based on our last
> > discussion.
> 
> I couldn't understand the real intentions of ALLOW_* flags until I saw them
> in code. I have to say C is more expressive than English in that regard :)
> 
> Generally I agree with your direction but think ALLOW_* flags are completely
> internal to LSM because they can be both produced and consumed inside an LSM
> module. So spilling them into SGX driver and also user mode code makes the
> solution ugly and in some cases impractical because not every enclave host
> process has a priori knowledge on whether or not an enclave page would be
> EMODPE'd at runtime.

In this case, the host process should tag *all* pages it *might* convert
to executable as ALLOW_EXEC.  LSMs can (and should/will) be written in
such a way that denying ALLOW_EXEC is fatal to the enclave if and only if
the enclave actually attempts mprotect(PROT_EXEC).

Take the SELinux path for example.  The only scenario in which PROT_WRITE
is cleared from @allowed_prot is if the page *starts* with PROT_EXEC.
If PROT_EXEC is denied on a page that starts RW, e.g. an EAUG'd page,
then PROT_EXEC will be cleared from @allowed_prot.

As Stephen pointed out, auditing the denials on @allowed_prot means the
log will contain false positives of a sort.  But this is more of a noise
issue than true false positives.  E.g. there are three possible outcomes
for the enclave.

  - The enclave does not do EMODPE[PROT_EXEC] in any scenario, ever.
    Requesting ALLOW_EXEC is either a straightforward a userspace bug or
    a poorly written generic enclave loader.

  - The enclave conditionally performs EMODPE[PROT_EXEC].  In this case
    the denial is a true false positive.
  
  - The enclave does EMODPE[PROT_EXEC] and its host userspace then fails
    on mprotect(PROT_EXEC), i.e. the LSM denial is working as intended.
    The audit log will be noisy, but viewed as a whole the denials aren't
    false positives.

The potential for noisy audit logs and/or false positives is unfortunate,
but it's (by far) the lesser of many evils.

> Theoretically speaking, what you really need is a per page flag (let's name
> it WRITTEN?) indicating whether a page has ever been written to (or more
> precisely, granted PROT_WRITE), which will be used to decide whether to grant
> PROT_EXEC when requested in future. Given the fact that all mprotect() goes
> through LSM and mmap() is limited to PROT_NONE, it's easy for LSM to capture
> that flag by itself instead of asking user mode code to provide it.
>
> That said, here is the summary of what I think is a better approach.
> * In hook security_file_alloc(), if @file is an enclave, allocate some data
>   structure to store for every page, the WRITTEN flag as described above.
>   WRITTEN is cleared initially for all pages.

This would effectively require *every* LSM to duplicate the SGX driver's
functionality, e.g. track per-page metadata, implement locking to prevent
races between multiple mm structs, etc...

>   Open: Given a file of type struct file *, how to tell if it is an enclave (i.e. /dev/sgx/enclave)?
> * In hook security_mmap_file(), if @file is an enclave, make sure @prot can
>   only be PROT_NONE. This is to force all protection changes to go through
>   security_file_mprotect().
> * In the newly introduced hook security_enclave_load(), set WRITTEN for pages
>   that are requested PROT_WRITE.

How would an LSM associate a page with a specific enclave?  vma->vm_file
will point always point at /dev/sgx/enclave.  vma->vm_mm is useless
because we're allowing multiple processes to map a single enclave, not to
mention that by mm would require holding a reference to the mm.

> * In hook security_file_mprotect(), if @vma->vm_file is an enclave, look up
>   and use WRITTEN flags for all pages within @vma, along with other global
>   flags (e.g. PROCESS__EXECMEM/FILE__EXECMOD in the case of SELinux) to decide
>   on allowing/rejecting @prot.

vma->vm_file will always be /dev/sgx/enclave at this point, which means
LSMs don't have the necessary anchor back to the source file, e.g. to
enforce FILE__EXECUTE.  The noexec file system case is also unaddressed.

> * In hook security_file_free(), if @file is an  enclave, free storage
>   allocated for WRITTEN flags.