linux-kernel - Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and opportunistically call it during first EPC page alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6bbfb38a0cd66af3d3562a82adac835316b1f4e5.camel@intel.com>
Date: Thu, 17 Apr 2025 11:12:08 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "Reshetova, Elena" <elena.reshetova@...el.com>, "Hansen, Dave"
	<dave.hansen@...el.com>
CC: "linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>, "Scarlata,
 Vincent R" <vincent.r.scarlata@...el.com>, "x86@...nel.org" <x86@...nel.org>,
	"jarkko@...nel.org" <jarkko@...nel.org>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "Cai, Chong" <chongc@...gle.com>, "Mallick, Asit K"
	<asit.k.mallick@...el.com>, "Aktas, Erdem" <erdemaktas@...gle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"bondarn@...gle.com" <bondarn@...gle.com>, "dionnaglaze@...gle.com"
	<dionnaglaze@...gle.com>, "Raynor, Scott" <scott.raynor@...el.com>
Subject: Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and
 opportunistically call it during first EPC page alloc


> 
> > 
> > Reading below, it could also fail due to running out of entropy, which is still
> > legally possible to happen IMHO.
> 
> Actually, no in this case, we only return false from sgx_updatesvn in case unknown
> error happens as agreed previously. In case we run out of entropy it should be safe
> to retry later and we dont prevent per current code EPC page allocation. 
> 
> > 
> > Maybe just:
> > 				/*
> > 				 * Updating SVN failed.  SGX might be broken,
> > 				 * or running out of entropy happened.  Do
> > not
> > 				 * allocate EPC page since it is not safe to
> > use
> > 				 * SGX anymore if it was the former.  If it was
> > 				 * due to running out of entropy, the further
> > 				 * call of EPC allocation will try
> > 				 * sgx_updatesvn() again.
> > 				 */
> 
> I agree with this except that the current code doesn’t prevent EPC allocation on any
> other error return than unknown error. The question is whenever we want to
> change the behaviour to require it? 

[...]

> > > + * Return:
> > > + * True: success or EUPDATESVN can be safely retried next time
> > > + * False: Unexpected error occurred
> > 
> > Hmm.. IIUC it could fail with running out of entropy but this is still legally
> > possible to happen.  And it is safe to retry.
> 
> Yes, in this case we get back SGX_INSUFFICIENT_ENTROPY currently we
> return "true" here and do not prevent EPC allocations of the page in this
> case, which means we will start populate EPC and can next time retry only
> when EPC is empty again. 

[...]

> > > +	switch (ret) {
> > > +	case 0:
> > > +		pr_info("EUPDATESVN: success\n");
> > > +		break;
> > > +	case SGX_EPC_NOT_READY:
> > > +	case SGX_INSUFFICIENT_ENTROPY:
> > > +	case SGX_EPC_PAGE_CONFLICT:
> > > +		ENCLS_WARN(ret, "EUPDATESVN");
> > > +		break;
> > 
> > I don't think we should use ENCLS_WARN() for SGX_INSUFFICIENT_ENTROPY,
> > since
> > IIUC it is still legally possible to happen after the above retry.
> > 
> > Also, it doesn't seem SGX_EPC_NOT_READY and SGX_EPC_PAGE_CONFLICT
> > are needed
> > since IIUC the only possible error is out of entropy.
> 
> Well, in case we have a kernel bug, and we think EPC is empty and it is safe
> to execute EUPDATESVN, while it is not the case, we can still get back the 
> SGX_EPC_NOT_READY and SGX_EPC_PAGE_CONFLICT from HW, right? 

Right, but as you said, in case we have a kernel bug.

Which means it is not expected and we should just use ENCLS_WARN() for them.

IMHO we can even add

	WARN_ON_ONCE(atomic_long_read(&sgx_nr_used_pages) != 0);

to sgx_updatesvn(), e.g., right after
 
	lockdep_assert_held(&sgx_epc_eupdatesvn_lock);

to assert sgx_updatesvn() is called when EPC is empty, thus SGX_EPC_NOT_READY
and SGX_EPC_PAGE_CONFLICT are not possible to happen.

> Which means we probably should warn about such buggy cases. 

Yes.

> And maybe we should also prevent page allocation from EPC in this case also
> similarly as for unknown error?

Yes.

I don't see any reason we should continue to allow SGX to work in case of
SGX_EPC_NOT_READY or SGX_EPC_PAGE_CONFLICT.

IIUC, we also agreed in the last round discussion that we should:

 "I guess the best action would be make sgx_alloc_epc_page() return
 consistently -ENOMEM, if the unexpected happens."

SGX_EPC_NOT_READY and SGX_EPC_PAGE_CONFLICT are indeed unexpected to me.

So my suggestion would be:

I think the sgx_updatesvn() should just return true when EUPDATESVN returns 0 or
SGX_NO_UPDATE, and return false for all other error codes.  And it should
ENCLS_WARN() for all other error codes, except SGX_INSUFFICIENT_ENTROPY because
it can still legally happen.

Something like:

	do {
		ret = __eupdatesvn();
		if (ret != SGX_INSUFFICIENT_ENTROPY)
			break;
	} while (--retry);

	if (!ret || ret == SGX_NO_UPDATE) {
		/*
		 * SVN successfully updated, or it was already up-to-date.
		 * Let users know when the update was successful.
		 */
		if (!ret)
			pr_info("SVN updated successfully\n");
		return true;
	}

	/*
	 * EUPDATESVN was called when EPC is empty, all other error
	 * codes are unexcepted except running out of entropy.
	 */
	if (ret != SGX_INSUFFICIENT_ENTROPY)
		ENCLS_WARN(ret, "EUPDATESVN");

	return false;
		

In __sgx_alloc_epc_page_from_node(), it should fail to allocate EPC page and
return -ENOMEM when sgx_updatesvn() returns false.  We should only allow EPC to
be allocated when we know the SVN is already up-to-date.

Any further call of EPC allocation will trigger sgx_updatesvn() again.  If it
was failed due to unexpected error, then it should continue to fail,
guaranteeing "sgx_alloc_epc_page() return consistently -ENOMEM, if the
unexpected happens".  If it was failed due to running out of entropy, it then
may fail again, or it will just succeed and then SGX can continue to work.