[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yn5tznK08SeWxd4S@iki.fi>
Date: Fri, 13 May 2022 17:40:14 +0300
From: Jarkko Sakkinen <jarkko@...nel.org>
To: Reinette Chatre <reinette.chatre@...el.com>
Cc: dave.hansen@...ux.intel.com, tglx@...utronix.de, bp@...en8.de,
luto@...nel.org, mingo@...hat.com, linux-sgx@...r.kernel.org,
x86@...nel.org, haitao.huang@...el.com, hpa@...or.com,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH V3 4/5] x86/sgx: Fix race between reclaimer and page
fault handler
On Thu, May 12, 2022 at 02:51:00PM -0700, Reinette Chatre wrote:
> Haitao reported encountering a WARN triggered by the ENCLS[ELDU]
> instruction faulting with a #GP.
>
> The WARN is encountered when the reclaimer evicts a range of
> pages from the enclave when the same pages are faulted back right away.
>
> Consider two enclave pages (ENCLAVE_A and ENCLAVE_B)
> sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the
> enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains
> just one entry, that of ENCLAVE_B.
>
> Scenario proceeds where ENCLAVE_A is being evicted from the enclave
> while ENCLAVE_B is faulted in.
>
> sgx_reclaim_pages() {
>
> ...
>
> /*
> * Reclaim ENCLAVE_A
> */
> mutex_lock(&encl->lock);
> /*
> * Get a reference to ENCLAVE_A's
> * shmem page where enclave page
> * encrypted data will be stored
> * as well as a reference to the
> * enclave page's PCMD data page,
> * PCMD_AB.
> * Release mutex before writing
> * any data to the shmem pages.
> */
> sgx_encl_get_backing(...);
> encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED;
> mutex_unlock(&encl->lock);
>
> /*
> * Fault ENCLAVE_B
> */
>
> sgx_vma_fault() {
>
> mutex_lock(&encl->lock);
> /*
> * Get reference to
> * ENCLAVE_B's shmem page
> * as well as PCMD_AB.
> */
> sgx_encl_get_backing(...)
> /*
> * Load page back into
> * enclave via ELDU.
> */
> /*
> * Release reference to
> * ENCLAVE_B' shmem page and
> * PCMD_AB.
> */
> sgx_encl_put_backing(...);
> /*
> * PCMD_AB is found empty so
> * it and ENCLAVE_B's shmem page
> * are truncated.
> */
> /* Truncate ENCLAVE_B backing page */
> sgx_encl_truncate_backing_page();
> /* Truncate PCMD_AB */
> sgx_encl_truncate_backing_page();
>
> mutex_unlock(&encl->lock);
>
> ...
> }
> mutex_lock(&encl->lock);
> encl_page->desc &=
> ~SGX_ENCL_PAGE_BEING_RECLAIMED;
> /*
> * Write encrypted contents of
> * ENCLAVE_A to ENCLAVE_A shmem
> * page and its PCMD data to
> * PCMD_AB.
> */
> sgx_encl_put_backing(...)
>
> /*
> * Reference to PCMD_AB is
> * dropped and it is truncated.
> * ENCLAVE_A's PCMD data is lost.
> */
> mutex_unlock(&encl->lock);
> }
>
> What happens next depends on whether it is ENCLAVE_A being faulted
> in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting
> with a #GP.
>
> If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called
> a new PCMD page is allocated and providing the empty PCMD data for
> ENCLAVE_A would cause ENCLS[ELDU] to #GP
>
> If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the
> reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction
> would #GP during its checks of the PCMD value and the WARN would be
> encountered.
>
> Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time
> it obtains a reference to the backing store pages of an enclave page it
> is in the process of reclaiming, fix the race by only truncating the PCMD
> page after ensuring that no page sharing the PCMD page is in the process
> of being reclaimed.
>
> Cc: stable@...r.kernel.org
> Fixes: 08999b2489b4 ("x86/sgx: Free backing memory after faulting the enclave page")
> Reported-by: Haitao Huang <haitao.huang@...el.com>
> Tested-by: Haitao Huang <haitao.huang@...el.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@...el.com>
> ---
> Changes since V2:
> - Declare "addr" and "entry" within loop. (Dave)
> - Fix incorrect error return when enclave page not found to belong
> to enclave. Continue search instead of returning failure. (Dave)
> - Add Haitao's "Tested-by" tag.
> - Rename pcmd_page_in_use() to reclaimer_writing_to_pcmd() to be less
> generic. (Dave)
> - Improve function comments to be clear about it testing for PCMD
> page soon becoming non-empty, also add context info to kernel-doc
> to indicate that enclave mutex must be held. (Dave)
>
> Changes since RFC v1:
> - New patch.
>
> arch/x86/kernel/cpu/sgx/encl.c | 94 +++++++++++++++++++++++++++++++++-
> 1 file changed, 93 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 5104a428b72c..243f3bd78145 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -12,6 +12,92 @@
> #include "encls.h"
> #include "sgx.h"
>
> +#define PCMDS_PER_PAGE (PAGE_SIZE / sizeof(struct sgx_pcmd))
> +/*
> + * 32 PCMD entries share a PCMD page. PCMD_FIRST_MASK is used to
> + * determine the page index associated with the first PCMD entry
> + * within a PCMD page.
> + */
> +#define PCMD_FIRST_MASK GENMASK(4, 0)
> +
> +/**
> + * reclaimer_writing_to_pcmd() - Query if any enclave page associated with
> + * a PCMD page is in process of being reclaimed.
> + * @encl: Enclave to which PCMD page belongs
> + * @start_addr: Address of enclave page using first entry within the PCMD page
> + *
> + * When an enclave page is reclaimed some Paging Crypto MetaData (PCMD) is
> + * stored. The PCMD data of a reclaimed enclave page contains enough
> + * information for the processor to verify the page at the time
> + * it is loaded back into the Enclave Page Cache (EPC).
> + *
> + * The backing storage to which enclave pages are reclaimed is laid out as
> + * follows:
> + * Encrypted enclave pages:SECS page:PCMD pages
> + *
> + * Each PCMD page contains the PCMD metadata of
> + * PAGE_SIZE/sizeof(struct sgx_pcmd) enclave pages.
> + *
> + * A PCMD page can only be truncated if it is (a) empty, and (b) not in the
> + * process of getting data (and thus soon being non-empty). (b) is tested with
> + * a check if an enclave page sharing the PCMD page is in the process of being
> + * reclaimed.
> + *
> + * The reclaimer sets the SGX_ENCL_PAGE_BEING_RECLAIMED flag when it
> + * intends to reclaim that enclave page - it means that the PCMD page
> + * associated with that enclave page is about to get some data and thus
> + * even if the PCMD page is empty, it should not be truncated.
> + *
> + * Context: Enclave mutex (&sgx_encl->lock) must be held.
> + * Return: 1 if the reclaimer is about to write to the PCMD page
> + * 0 if the reclaimer has no intention to write to the PCMD page
> + */
> +static int reclaimer_writing_to_pcmd(struct sgx_encl *encl,
> + unsigned long start_addr)
> +{
> + int reclaimed = 0;
> + int i;
> +
> + /*
> + * PCMD_FIRST_MASK is based on number of PCMD entries within
> + * PCMD page being 32.
> + */
> + BUILD_BUG_ON(PCMDS_PER_PAGE != 32);
> +
> + for (i = 0; i < PCMDS_PER_PAGE; i++) {
> + struct sgx_encl_page *entry;
> + unsigned long addr;
> +
> + addr = start_addr + i * PAGE_SIZE;
> +
> + /*
> + * Stop when reaching the SECS page - it does not
> + * have a page_array entry and its reclaim is
> + * started and completed with enclave mutex held so
> + * it does not use the SGX_ENCL_PAGE_BEING_RECLAIMED
> + * flag.
> + */
> + if (addr == encl->base + encl->size)
> + break;
> +
> + entry = xa_load(&encl->page_array, PFN_DOWN(addr));
> + if (!entry)
> + continue;
> +
> + /*
> + * VA page slot ID uses same bit as the flag so it is important
> + * to ensure that the page is not already in backing store.
> + */
> + if (entry->epc_page &&
> + (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)) {
> + reclaimed = 1;
> + break;
> + }
> + }
> +
> + return reclaimed;
> +}
> +
> /*
> * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's
> * follow right after the EPC data in the backing storage. In addition to the
> @@ -47,6 +133,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
> unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK;
> struct sgx_encl *encl = encl_page->encl;
> pgoff_t page_index, page_pcmd_off;
> + unsigned long pcmd_first_page;
> struct sgx_pageinfo pginfo;
> struct sgx_backing b;
> bool pcmd_page_empty;
> @@ -58,6 +145,11 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
> else
> page_index = PFN_DOWN(encl->size);
>
> + /*
> + * Address of enclave page using the first entry within the PCMD page.
> + */
> + pcmd_first_page = PFN_PHYS(page_index & ~PCMD_FIRST_MASK) + encl->base;
> +
> page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
>
> ret = sgx_encl_get_backing(encl, page_index, &b);
> @@ -99,7 +191,7 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
>
> sgx_encl_truncate_backing_page(encl, page_index);
>
> - if (pcmd_page_empty)
> + if (pcmd_page_empty && !reclaimer_writing_to_pcmd(encl, pcmd_first_page))
> sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
>
> return ret;
> --
> 2.25.1
>
Reviewed-by: Jarkko Sakkinen <jarkko@...nel.org>
BR, Jarkko
Powered by blists - more mailing lists