linux-kernel - Re: [PATCH 05/25] x86/sgx: Introduce runtime protection bits

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c091ee8b-04a8-9165-00c1-835bb520e240@intel.com>
Date:   Mon, 13 Dec 2021 14:34:05 -0800
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Andy Lutomirski <luto@...nel.org>, <dave.hansen@...ux.intel.com>,
        <jarkko@...nel.org>, <tglx@...utronix.de>, <bp@...en8.de>,
        <mingo@...hat.com>, <linux-sgx@...r.kernel.org>, <x86@...nel.org>
CC:     <seanjc@...gle.com>, <kai.huang@...el.com>,
        <cathy.zhang@...el.com>, <cedric.xing@...el.com>,
        <haitao.huang@...el.com>, <mark.shanahan@...el.com>,
        <hpa@...or.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 05/25] x86/sgx: Introduce runtime protection bits

Hi Andy,

I would like to check in if you had some time to digest my responses 
with a few high level questions below ...

On 12/4/2021 3:55 PM, Reinette Chatre wrote:
> On 12/4/2021 9:56 AM, Andy Lutomirski wrote:
>> On 12/3/21 17:14, Reinette Chatre wrote:
>>> On 12/3/2021 4:38 PM, Andy Lutomirski wrote:
>>>> On 12/3/21 14:12, Reinette Chatre wrote:
>>>>> On 12/3/2021 11:28 AM, Andy Lutomirski wrote:
>>>>>> On 12/1/21 11:23, Reinette Chatre wrote:
>>>>>>> Enclave creators declare their paging permission intent at the time
>>>>>>> the pages are added to the enclave. These paging permissions are
>>>>>>> vetted when pages are added to the enclave and stashed off
>>>>>>> (in sgx_encl_page->vm_max_prot_bits) for later comparison with
>>>>>>> enclave PTEs.
>>>>>>>
>>>>>>
>>>>>> I'm a bit confused here. ENCLU[EMODPE] allows the enclave to 
>>>>>> change the EPCM permission bits however it likes with no oversight 
>>>>>> from the kernel.   So we end up with a whole bunch of permission 
>>>>>> masks:
>>>>>
>>>>> Before jumping to the permission masks I would like to step back 
>>>>> and just confirm the context. We need to consider the following 
>>>>> three permissions:
>>>>>
>>>>> EPCM permissions: the enclave page permissions maintained in the 
>>>>> SGX hardware. The OS is constrained here in that it cannot query 
>>>>> the current EPCM permissions. Even so, the OS needs to ensure PTEs 
>>>>> are installed appropriately (we do not want a RW PTE for a 
>>>>> read-only enclave page)
>>>>
>>>> Why not?  What's wrong with an RW PTE for a read-only enclave page?
>>>>
>>>> If you convince me that this is actually important, then I'll read 
>>>> all the stuff below.
>>>
>>> Perhaps it is my misunderstanding/misinterpretation of the current 
>>> implementation? From what I understand the current requirement, as 
>>> enforced in the current mmap(), mprotect() as well as fault() hooks, 
>>> is that mappings are required to have identical or weaker permission 
>>> than the enclave permission.
>>
>> The current implementation does require that, but for a perhaps 
>> counterintuitive reason.  If a SELinux-restricted (or similarly 
>> restricted) process that is *not* permitted to do JIT-like things 
>> loads an enclave, it's entirely okay for it to initialize RW enclave 
>> pages however it likes and it's entirely okay for it to initialize RX 
>> (or XO if that ever becomes a thing) enclave pages from appropriately 
>> files on disk.  But it's not okay for it to create RWX enclave pages 
>> or to initialize RX enclave pages from untrusted application memory. [0]
>>
>> So we have a half-baked implementation right now: the permission to 
>> execute a page is decided based on secinfo (max permissions) when the 
>> enclave is set up, and it's enforced at the PTE level.  The PTE 
>> enforcement is because, on SGX2 hardware, the enclave can do EMODPE 
>> and bypass any supposed restrictions in the EPCM.
>>
>> The only coupling between EPCM and PTE here is that the max_perm is 
>> initialized together with EPCM, but it didn't have to be that way.
>>
>> An SGX2 implementation needs to be more fully baked, because in a 
>> dynamic environment enclaves need to be able to use EMODPE and 
>> actually end up with permissions that exceed the initial secinfo 
>> permissions.  So 
> 
> Could you please elaborate why this is a requirement? In this 
> implementation the secinfo of a page added before enclave initialization 
> (via SGX_IOC_ENCLAVE_ADD_PAGES) would indicate the maximum permissions 
> it may have during its lifetime. Pages needing to be writable and 
> executable during their lifetime can be created with RWX secinfo and 
> during the enclave runtime the pages could obtain all combinations of 
> permissions: RWX, R, RW, RX. A page added with RW secinfo may have R or 
> RW permissions during its lifetime but never RX or RWX.
> 
> So far our inquiries on whether this is acceptable has been positive and 
> is also what Dave attempted to put a spotlight on in:
> https://lore.kernel.org/lkml/94d8d631-5345-66c4-52a3-941e52500f84@intel.com/ 
> 
> 
> This above is specific to pages added before enclave initialization. In 
> this implementation pages added after enclave initialization, those 
> needing the ENCLS[EAUG] SGX2 instruction, are added with max permissions 
> of RW so could only have R or RW permissions during their lifetime. This 
> is an understood limitation and it is understood that integration with 
> user policy is required to support these pages obtaining executable 
> permission. The plan is to handle user policy integration in a series 
> that follows this core SGX2 enabling.

Are you ok with the strategy to support modification of enclave page 
permissions?

> 
>> it needs to be possible to make a page that starts out R (or RW or 
>> whatever) but nonetheless has max_perm=RWX so that the enclave can use 
>> a combination of EMODPE and (ioctl-based) EMODPR to do JIT.  So I 
>> think you should make it possible to set up pages like this, but I see 
>> no reason to couple the PTE and the EPCM permissions.
>>
>>>
>>> Could you please elaborate how you envision PTEs should be managed in 
>>> this implementation?
>>
>> As above: PTE permissions may not exceed max_perm, and EPCM is 
>> entirely separate except to the extent needed for ABI compatibility 
>> with SGX1 runtimes.
> 
> ok, so if I understand correctly you, since PTE permissions may not 
> exceed max_perm and EPCM are separate, this seems to get back to your 
> previous question of "What's wrong with an RW PTE for a read-only 
> enclave page?"
> 
> This is indeed something that we could allow but not doing so (that is 
> PTEs not exceeding EPCM permissions) would better support the SGX 
> runtime. That is why I separated out the addition of the pfn_mkwrite() 
> callback in the previous patch (04/25). Like in your example, there is a 
> RW mapping of a read-only enclave page that first results in a RW PTE 
> for the read-only enclave page. That would result in a #PF with the SGX 
> flag set (0x8007). If the PTE matches the enclave permissions the page 
> fault would have familiar 0x7 error code.
> 
> In either case user space would encounter a #PF so technically there is 
> nothing "wrong" with allowing this - even so, as motivated in the 
> previous patch: accurate exception information supports the SGX runtime, 
> which is virtually always implemented inside a shared library, by 
> providing accurate information in support of its management of the SGX 
> enclave.

Are you ok with managing PTEs in this way? It matches your requirement 
that PTE permissions may not exceed max_perm and ABI is compatible with 
SGX1. Additionally, PTEs are not allowed to exceed EPCM permissions, 
which is not an ABI change since it was not a consideration during SGX1 
where EPCM permissions could not change.


>> [0] I'm not sure anyone actually has a system set up like this or that 
>> the necessary LSM support is in the kernel.  But it's supposed to be 
>> possible without changing the ABI.
>>

Thank you very much

Reinette