[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <D4EN7IQUAUKB.YJJOI927HJY7@kernel.org>
Date: Tue, 24 Sep 2024 19:08:17 +0300
From: "Jarkko Sakkinen" <jarkko@...nel.org>
To: "Dmitrii Kuvaiskii" <dmitrii.kuvaiskii@...el.com>,
<dave.hansen@...ux.intel.com>, <kai.huang@...el.com>,
<haitao.huang@...ux.intel.com>, <reinette.chatre@...el.com>,
<linux-sgx@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Cc: <mona.vij@...el.com>, <kailun.qin@...el.com>, <stable@...r.kernel.org>
Subject: Re: [PATCH v6 3/3] x86/sgx: Resolve EREMOVE page vs EAUG page data
race
On Tue Sep 24, 2024 at 12:49 PM EEST, Dmitrii Kuvaiskii wrote:
> Two enclave threads may try to add and remove the same enclave page
> simultaneously (e.g., if the SGX runtime supports both lazy allocation
> and MADV_DONTNEED semantics). Consider some enclave page added to the
> enclave. User space decides to temporarily remove this page (e.g.,
> emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
> space performs a memory access on the same page on CPU2, which results
> in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
> follows:
>
> /*
> * CPU1: User space performs
> * ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
> * on enclave page X
> */
> sgx_encl_remove_pages() {
>
> mutex_lock(&encl->lock);
>
> entry = sgx_encl_load_page(encl);
> /*
> * verify that page is
> * trimmed and accepted
> */
>
> mutex_unlock(&encl->lock);
>
> /*
> * remove PTE entry; cannot
> * be performed under lock
> */
> sgx_zap_enclave_ptes(encl);
> /*
> * Fault on CPU2 on same page X
> */
> sgx_vma_fault() {
> /*
> * PTE entry was removed, but the
> * page is still in enclave's xarray
> */
> xa_load(&encl->page_array) != NULL ->
> /*
> * SGX driver thinks that this page
> * was swapped out and loads it
> */
> mutex_lock(&encl->lock);
> /*
> * this is effectively a no-op
> */
> entry = sgx_encl_load_page_in_vma();
> /*
> * add PTE entry
> *
> * *BUG*: a PTE is installed for a
> * page in process of being removed
> */
> vmf_insert_pfn(...);
>
> mutex_unlock(&encl->lock);
> return VM_FAULT_NOPAGE;
> }
> /*
> * continue with page removal
> */
> mutex_lock(&encl->lock);
>
> sgx_encl_free_epc_page(epc_page) {
> /*
> * remove page via EREMOVE
> */
> /*
> * free EPC page
> */
> sgx_free_epc_page(epc_page);
> }
>
> xa_erase(&encl->page_array);
>
> mutex_unlock(&encl->lock);
> }
>
> Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
> same page. This enclave page becomes perpetually inaccessible (until
> another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
> marked accessible in the PTE entry but is not EAUGed, and any subsequent
> access to this page raises a fault: with the kernel believing there to
> be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
> path do_user_addr_fault() -> access_error() causes the SGX driver's
> sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
> The userspace SIGSEGV handler cannot perform EACCEPT because the page
> was not EAUGed. Thus, the user space is stuck with the inaccessible
> page.
>
> Fix this race by forcing the fault handler on CPU2 to back off if the
> page is currently being removed (on CPU1). This is achieved by
> setting SGX_ENCL_PAGE_BUSY flag right-before the first mutex_unlock() in
> sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
> page is busy, and if yes then CPU2 backs off and waits until the page is
> completely removed. After that, any memory access to this page results
> in a normal "allocate and EAUG a page on #PF" flow.
>
> Additionally fix a similar race: user space converts a normal enclave
> page to a TCS page (via SGX_IOC_ENCLAVE_MODIFY_TYPES) on CPU1, and at
> the same time, user space performs a memory access on the same page on
> CPU2. This fix is not strictly necessary (this particular race would
> indicate a bug in a user space application), but it gives a consistent
> rule: if an enclave page is under certain operation by the kernel with
> the mapping removed, then other threads trying to access that page are
> temporarily blocked and should retry.
>
> Fixes: 9849bb27152c1 ("x86/sgx: Support complete page removal")
> Fixes: 45d546b8c109d ("x86/sgx: Support modifying SGX page type")
> Cc: stable@...r.kernel.org
> Reviewed-by: Kai Huang <kai.huang@...el.com>
> Reviewed-by: Jarkko Sakkinen <jarkko@...nel.org>
> Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@...el.com>
> ---
> arch/x86/kernel/cpu/sgx/encl.h | 3 ++-
> arch/x86/kernel/cpu/sgx/ioctl.c | 17 +++++++++++++++++
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
> index b566b8ad5f33..96b11e8fb770 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.h
> +++ b/arch/x86/kernel/cpu/sgx/encl.h
> @@ -22,7 +22,8 @@
> /* 'desc' bits holding the offset in the VA (version array) page. */
> #define SGX_ENCL_PAGE_VA_OFFSET_MASK GENMASK_ULL(11, 3)
>
> -/* 'desc' bit indicating that the page is busy (being reclaimed). */
> +/* 'desc' bit indicating that the page is busy (being reclaimed, removed or
> + * converted to a TCS page). */
> #define SGX_ENCL_PAGE_BUSY BIT(2)
>
> /*
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index b65ab214bdf5..c6f936440438 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -969,12 +969,22 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl,
> /*
> * Do not keep encl->lock because of dependency on
> * mmap_lock acquired in sgx_zap_enclave_ptes().
> + *
> + * Releasing encl->lock leads to a data race: while CPU1
> + * performs sgx_zap_enclave_ptes() and removes the PTE
> + * entry for the enclave page, CPU2 may attempt to load
> + * this page (because the page is still in enclave's
> + * xarray). To prevent CPU2 from loading the page, mark
> + * the page as busy before unlock and unmark after lock
> + * again.
> */
> + entry->desc |= SGX_ENCL_PAGE_BUSY;
> mutex_unlock(&encl->lock);
>
> sgx_zap_enclave_ptes(encl, addr);
>
> mutex_lock(&encl->lock);
> + entry->desc &= ~SGX_ENCL_PAGE_BUSY;
>
> sgx_mark_page_reclaimable(entry->epc_page);
> }
> @@ -1141,7 +1151,14 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
> /*
> * Do not keep encl->lock because of dependency on
> * mmap_lock acquired in sgx_zap_enclave_ptes().
> + *
> + * Releasing encl->lock leads to a data race: while CPU1
> + * performs sgx_zap_enclave_ptes() and removes the PTE entry
> + * for the enclave page, CPU2 may attempt to load this page
> + * (because the page is still in enclave's xarray). To prevent
> + * CPU2 from loading the page, mark the page as busy.
> */
> + entry->desc |= SGX_ENCL_PAGE_BUSY;
> mutex_unlock(&encl->lock);
>
> sgx_zap_enclave_ptes(encl, addr);
Just acknowledging that my tag hold also for this version. Good job!
All the diff's are as tight as they can get, which is awesome.
BR, Jarkko
Powered by blists - more mailing lists