lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191007145011.GA18016@linux.intel.com>
Date:   Mon, 7 Oct 2019 07:50:11 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>,
        linux-kernel@...r.kernel.org, x86@...nel.org,
        linux-sgx@...r.kernel.org, akpm@...ux-foundation.org,
        dave.hansen@...el.com, nhorman@...hat.com, npmccallum@...hat.com,
        serge.ayoun@...el.com, shay.katz-zamir@...el.com,
        haitao.huang@...el.com, andriy.shevchenko@...ux.intel.com,
        tglx@...utronix.de, kai.svahn@...el.com, josh@...htriplett.org,
        luto@...nel.org, kai.huang@...el.com, rientjes@...gle.com,
        cedric.xing@...el.com
Subject: Re: [PATCH v22 09/24] x86/sgx: Add functions to allocate and free
 EPC pages

On Sat, Oct 05, 2019 at 06:44:08PM +0200, Borislav Petkov wrote:
> On Tue, Sep 03, 2019 at 05:26:40PM +0300, Jarkko Sakkinen wrote:
> > +/**
> > + * __sgx_free_page - Free an EPC page
> > + * @page:	pointer a previously allocated EPC page
> > + *
> > + * EREMOVE an EPC page and insert it back to the list of free pages.
> > + *
> > + * Return:
> > + *   0 on success
> > + *   SGX error code if EREMOVE fails
> > + */
> > +int __sgx_free_page(struct sgx_epc_page *page)
> > +{
> > +	struct sgx_epc_section *section = sgx_epc_section(page);
> > +	int ret;
> 
> Shouldn't you be grabbing the lock here already?
> 
> Or nothing can happen to that page from another thread after you
> ENCLS[EREMOVE] it and before it is added to the ->page_list of the
> section?

The caller is responsible for ensuring EREMOVE can be safely executed,
e.g. by holding the enclave's lock.

For many ENCLS leafs, EREMOVE included, the CPU requires exclusive access
to the SGX Enclave Control Structures (SECS)[*] and will signal a #GP if
a different logical CPU is already executing an ENCLS leaf that requires
exclusive SECS access.  The SGX subsystem uses a per-enclave mutex to
serialize such ENCLS leafs, among other things.

[*] The SECS is a per-enclave page that resides in the EPC and can only be
    directly accessed by the CPU.  It's used to track metadata about the
    enclave, e.g. number of child pages, base, size, etc...

> > +
> > +	ret = __eremove(sgx_epc_addr(page));
> > +	if (ret)
> > +		return ret;
> > +
> > +	spin_lock(&section->lock);
> > +	list_add_tail(&page->list, &section->page_list);
> > +	section->free_cnt++;
> > +	spin_unlock(&section->lock);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(__sgx_free_page);
> > +
> > +/**
> > + * sgx_free_page - Free an EPC page and WARN on failure
> > + * @page:	pointer to a previously allocated EPC page
> > + *
> > + * EREMOVE an EPC page and insert it back to the list of free pages, and WARN
> > + * if EREMOVE fails.  For use when the call site cannot (or chooses not to)
> > + * handle failure, i.e. the page is leaked on failure.
> > + */
> > +void sgx_free_page(struct sgx_epc_page *page)
> > +{
> > +	int ret;
> > +
> > +	ret = __sgx_free_page(page);
> > +	WARN(ret > 0, "sgx: EREMOVE returned %d (0x%x)", ret, ret);
> 
> That will potentially flood dmesg. Why are we even accommodating such
> callers? They either handle the error or they don't get to alloc EPC
> pages. There's also __must_check with which you can enforce the error
> code checking or we simply don't allow not handling failure. Fullstop.

It was deliberately left as a WARN as having multiple stack traces has
been very helpful for triaging and debugging software bugs.  I agree that
it can and should be changed to a WARN_ONCE() for upstream.

As for why it exists at all, the WARN will only fire if there is a kernel
and/or CPU bug.  In the vast majority of cases, EREMOVE is guaranteed to
be successful (assuming no bugs).  In those situations, if EREMOVE fails
there really isn't a sane option other to WARN and leak the page, e.g.
the kernel can't override the CPUs protection to forcefully EREMOVE the
page.

The non-warn variant, __sgx_free_page(), is provided for the (currently)
one case where EREMOVE failure is expected/tolerable (opportunstically
freeing EPC pages when the enclave is killed).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ