linux-kernel - Re: [PATCH v3 4/7] x86/sgx: Add SGX infrastructure to recover from poison

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YQHhd0qKZqMCWqks@google.com>
Date:   Wed, 28 Jul 2021 23:00:07 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Tony Luck <tony.luck@...el.com>,
        Jarkko Sakkinen <jarkko@...nel.org>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 4/7] x86/sgx: Add SGX infrastructure to recover from
 poison

On Wed, Jul 28, 2021, Dave Hansen wrote:
> On 7/28/21 1:46 PM, Tony Luck wrote:
> > +int sgx_memory_failure(unsigned long pfn, int flags)
> > +{
> ...
> > +	page->flags |= SGX_EPC_PAGE_POISON;
> 
> Is this safe outside of any locks?

It's safe outside of sgx_reclaimer_lock iff this can guarantee nothing else can
reach the page.  I'm pretty sure that doesn't hold true here.

> I see the reclaimer doing things like:
> 
>                 epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED;
> 
> I'd worry that this code and other non-atomic epc_page->flags
> manipulation could trample on each other.
> 
> This might need to some some atomic bit manipulation *and* convert all
> the other epc_page->flags users.

I don't think atomics would be sufficient as that would open all sorts of possible
races.  E.g. this new code in __sgx_sanitize_pages()

                page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);

+               if (page->flags & SGX_EPC_PAGE_POISON) {
+                       list_del(&page->list);
+                       continue;
+               }
+
		***HERE***
                ret = __eremove(sgx_get_epc_virt_addr(page));

could attempt EREMOVE on a freshly POISONed page.  That appears to be "benign"
since ENCLS is wrapped with_ASM_EXTABLE_FAULT, but it feels wrong to add a check
that we know can race.

And similar races for allocation/free could hand out a poisoned page or add one
to the free list.

@@ -585,6 +600,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)

        for ( ; ; ) {
                page = __sgx_alloc_epc_page();
+
+               if (page->flags & SGX_EPC_PAGE_POISON)
+                       continue;
		*** HERE ***
+


@@ -630,7 +651,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
        spin_lock(&node->lock);

        page->owner = NULL;
-       list_add_tail(&page->list, &node->free_page_list);
+       if (!(page->flags & SGX_EPC_PAGE_POISON))
		*** HERE ***
+               list_add_tail(&page->list, &node->free_page_list);


Setting POISON and hoping we eventually notice doesn't sound robust.  Maybe some
of these races are unavoidable due to the nature of #MC delivery, but I would hope
the kernel can at least avoid handing out a poisoned page to a different enclave.