[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <690d3d8e-6214-dcdd-daaa-48a380114ad7@intel.com>
Date: Thu, 17 Mar 2022 15:08:04 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Jarkko Sakkinen <jarkko@...nel.org>
CC: Haitao Huang <haitao.huang@...ux.intel.com>,
"Dhanraj, Vijay" <vijay.dhanraj@...el.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"bp@...en8.de" <bp@...en8.de>,
"Lutomirski, Andy" <luto@...nel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"Christopherson,, Sean" <seanjc@...gle.com>,
"Huang, Kai" <kai.huang@...el.com>,
"Zhang, Cathy" <cathy.zhang@...el.com>,
"Xing, Cedric" <cedric.xing@...el.com>,
"Huang, Haitao" <haitao.huang@...el.com>,
"Shanahan, Mark" <mark.shanahan@...el.com>,
"hpa@...or.com" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH V2 16/32] x86/sgx: Support restricting of enclave page
permissions
Hi Jarkko,
On 3/16/2022 9:30 PM, Jarkko Sakkinen wrote:
> On Mon, Mar 14, 2022 at 08:32:28AM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 3/13/2022 8:42 PM, Jarkko Sakkinen wrote:
>>> On Fri, Mar 11, 2022 at 11:28:27AM -0800, Reinette Chatre wrote:
>>>> Supporting permission restriction in an ioctl() enables the runtime to manage
>>>> the enclave memory without needing to map it.
>>>
>>> Which is opposite what you do in EAUG. You can also augment pages without
>>> needing the map them. Sure you get that capability, but it is quite useless
>>> in practice.
>>>
>>>> I have considered the idea of supporting the permission restriction with
>>>> mprotect() but as you can see in this response I did not find it to be
>>>> practical.
>>>
>>> Where is it practical? What is your application? How is it practical to
>>> delegate the concurrency management of a split mprotect() to user space?
>>> How do we get rid off a useless up-call to the host?
>>>
>>
>> The email you responded to contained many obstacles against using mprotect()
>> but you chose to ignore them and snipped them all from your response. Could
>> you please address the issues instead of dismissing them?
>
> I did read the whole email but did not see anything that would make a case
> for fully exposed EMODPR, or having asymmetrical towards how EAUG works.
I believe that on its own each obstacle I shared with you is significant enough
to not follow that approach. You simply respond that I am just not making a
case without acknowledging any obstacle or providing a reason why the obstacles
are not valid.
To help me understand your view, could you please respond to each of the
obstacles I list below and how it is not an issue?
1) ABI change:
mprotect() is currently supported to modify VMA permissions
irrespective of EPCM permissions. Supporting EPCM permission
changes with mprotect() would change this behavior.
For example, currently it is possible to have RW enclave
memory and support multiple tasks accessing the memory. Two
tasks can map the memory RW and later one can run mprotect()
to reduce the VMA permissions to read-only without impacting
the access of the other task.
By moving EPCM permission changes to mprotect() this usage
will no longer be supported and current behavior will change.
2) Only half EPCM permission management:
Moving to mprotect() as a way to set EPCM permissions is
not a clear interface for EPCM permission management because
the kernel can only restrict permissions. Even so, the kernel
has no insight into the current EPCM permissions and thus whether they
actually need to be restricted so every mprotect() call,
all except RWX, will need to be treated as a permission
restriction with all the implementation obstacles
that accompany it (more below).
There are two possible ways to implement permission restriction
as triggered by mprotect(), (a) during the mprotect() call or
(b) during a subsequent #PF (as suggested by you), each has
its own obstacles.
3) mprotect() implementation
When the user calls mprotect() the expectation is that the
call will either succeed or fail. If the call fails the user
expects the system to be unchanged. This is not possible if
permission restriction is done as part of mprotect().
(a) mprotect() may span multiple VMAs and involves VMA splits
that (from what I understand) cannot be undone. SGX memory
does not support VMA merges. If any SGX function
(EMODPR or ETRACK on any page) done after a VMA split fails
then the user will be left with fragmented memory.
(b) The EMODPR/ETRACK pair can fail on any of the pages provided
by the mprotect() call. If there is a failure then the
kernel cannot undo previously executed EMODPR since the kernel
cannot run EMODPE. The EPCM permissions are thus left in inconsistent
state since some of the pages would have changed EPCM permissions
and mprotect() does not have mechanism to communicate
partial success.
The partial success is needed to communicate to user space
(i) which pages need EACCEPT, (ii) which pages need to be
in new request (although user space does not have information
to help the new request succeed - see below).
(c) User space runtime has control over management of EPC memory
and accurate failure information would help it to do so.
Knowing the error code of the EMODPR failure would help
user space to take appropriate action. For example, EMODPR
can return "SGX_PAGE_NOT_MODIFIABLE" that helps the runtime
to learn that it needs to run EACCEPT on that page before
the EMODPR can succeed. Alternatively, if it learns that the
return is "SGX_EPC_PAGE_CONFLICT" then it could determine
that some other part of the runtime attempted an ENCLU
function on that page.
It is not possible to provide such detailed errors to user
space with mprotect().
4) #PF implementation
(a) There is more to restricting permissions than just running
ENCLS[EMODPR]. After running ENCLS[EMODPR] the kernel should
also initiate the ETRACK flow to ensure that any thread within
the enclave is interrupted by sending an IPI to the CPU,
this includes the thread that just triggered the #PF.
(b) Second consideration of the EMODPR and ETRACK flow is that
this has a large "blast radius" in that any thread in the
enclave needs to be interrupted. #PFs may arrive at any time
so setting up a page range where a fault into any page in the
page range will trigger enclave exits for all threads is
a significant yet random impact. I believe it would be better
to update all pages in the range at the same time and in this
way contain the impact of this significant EMODPR/ETRACK/IPIs
flow.
(c) How will the page fault handler know when EMODPR/ETRACK should
be run? Consider that the page fault handler can be called
significantly later than the mprotect() call and that
user space can call EMODPE any time to extend permissions.
This implies that EMODPR/ETRACK/IPIs should be run during
*every* page fault, irrespective of mprotect().
(d) If a page is in pending or modified state then EMODPR will
always fail. This is something that needs to be fixed by
user space runtime but the page fault will not be able
to communicate this.
Considering the above, could you please provide clear guidance on
how you envision permission restriction to be supported by mprotect()?
Reinette
Powered by blists - more mailing lists