[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <023ab74f-82b7-41fc-ab20-0c0089f1f348@intel.com>
Date: Wed, 7 May 2025 09:04:41 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Elena Reshetova <elena.reshetova@...el.com>
Cc: jarkko@...nel.org, seanjc@...gle.com, kai.huang@...el.com,
linux-sgx@...r.kernel.org, linux-kernel@...r.kernel.org, x86@...nel.org,
asit.k.mallick@...el.com, vincent.r.scarlata@...el.com, chongc@...gle.com,
erdemaktas@...gle.com, vannapurve@...gle.com, dionnaglaze@...gle.com,
bondarn@...gle.com, scott.raynor@...el.com
Subject: Re: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX
enclaves
On 5/7/25 04:14, Elena Reshetova wrote:
> In case an SGX vulnerability is discovered and TCB recovery
> for SGX is triggered, Intel specifies a process that must be
> followed for a given vulnerability. Steps to mitigate can vary
> based on vulnerability type, affected components, etc.
> In some cases, a vulnerability can be mitigated via a runtime
> recovery flow by shutting down all running SGX enclaves,
> clearing enclave page cache (EPC), applying a microcode patch
> that does not require a reboot (via late microcode loading) and
> restarting all SGX enclaves.
Plain language and brevity have a lot of value in changelogs. There's a
substantial amount of irrelevant content here.
> Even when the above-described runtime recovery flow to mitigate the
> SGX vulnerability is followed, the SGX attestation evidence will
> still reflect the security SVN version being equal to the previous
> state of security SVN (containing vulnerability) that created
> and managed the enclave until the runtime recovery event. This
> limitation currently can be only overcome via a platform reboot,
> which negates all the benefits from the rebootless late microcode
> loading and not required in this case for functional or security
> purposes.
Can this please be trimmed down?
Actually, I think I wrote changelogs for this once upon a time. Could
you please go dig those up and use them?
> SGX architecture introduced a new instruction called ENCLS[EUPDATESVN]
> to Ice Lake [1].
Is it really on "Ice Lake" parts? Like, does it show up on
INTEL_ICELAKE? If not, this is only confusing and mostly irrelevant
information.
> It allows updating security SVN version, given that EPC
> is completely empty. The latter is required for security reasons
> in order to reason that enclave security posture is as secure as the
> security SVN version of the TCB that created it.
>
> Additionally it is important to ensure that while ENCLS[EUPDATESVN]
> runs, no concurrent page creation happens in EPC, because it might
> result in #GP delivered to the creator. Legacy SW might not be prepared
> to handle such unexpected #GPs and therefore this patch introduces
> a locking mechanism in sgx_(vepc_)open to ensure no concurrent EPC
> allocations can happen.
>
> Implement ENCLS[EUPDATESVN] and enable kernel to opportunistically
> call it during sgx_(vepc_)open().
> [1]
> https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true
These become stale almost immediately. Please also cite the document title.
> arch/x86/include/asm/sgx.h | 42 ++++++++++++-------
> arch/x86/kernel/cpu/sgx/driver.c | 4 ++
> arch/x86/kernel/cpu/sgx/encl.c | 1 +
> arch/x86/kernel/cpu/sgx/encls.h | 5 +++
> arch/x86/kernel/cpu/sgx/main.c | 70 ++++++++++++++++++++++++++++++++
> arch/x86/kernel/cpu/sgx/sgx.h | 3 ++
> arch/x86/kernel/cpu/sgx/virt.c | 6 +++
> 7 files changed, 116 insertions(+), 15 deletions(-)
Gah, how did this get squished back down to a single patch? It was
multiple patches before. There are multiple logical things going on here
and they need to be broken out.
> diff --git a/arch/x86/include/asm/sgx.h b/arch/x86/include/asm/sgx.h
> index 6a0069761508..8ac026ef6365 100644
> --- a/arch/x86/include/asm/sgx.h
> +++ b/arch/x86/include/asm/sgx.h
> @@ -27,22 +27,26 @@
> /* The bitmask for the EPC section type. */
> #define SGX_CPUID_EPC_MASK GENMASK(3, 0)
>
> +/* EUPDATESVN support in CPUID.0x12.0.EAX */
> +#define SGX_CPUID_EUPDATESVN BIT(10)
> +
> enum sgx_encls_function {
> - ECREATE = 0x00,
> - EADD = 0x01,
> - EINIT = 0x02,
> - EREMOVE = 0x03,
> - EDGBRD = 0x04,
> - EDGBWR = 0x05,
> - EEXTEND = 0x06,
> - ELDU = 0x08,
> - EBLOCK = 0x09,
> - EPA = 0x0A,
> - EWB = 0x0B,
> - ETRACK = 0x0C,
> - EAUG = 0x0D,
> - EMODPR = 0x0E,
> - EMODT = 0x0F,
> + ECREATE = 0x00,
> + EADD = 0x01,
> + EINIT = 0x02,
> + EREMOVE = 0x03,
> + EDGBRD = 0x04,
> + EDGBWR = 0x05,
> + EEXTEND = 0x06,
> + ELDU = 0x08,
> + EBLOCK = 0x09,
> + EPA = 0x0A,
> + EWB = 0x0B,
> + ETRACK = 0x0C,
> + EAUG = 0x0D,
> + EMODPR = 0x0E,
> + EMODT = 0x0F,
> + EUPDATESVN = 0x18,
> };
>
> /**
> @@ -73,6 +77,11 @@ enum sgx_encls_function {
> * public key does not match IA32_SGXLEPUBKEYHASH.
> * %SGX_PAGE_NOT_MODIFIABLE: The EPC page cannot be modified because it
> * is in the PENDING or MODIFIED state.
> + * %SGX_INSUFFICIENT_ENTROPY: Insufficient entropy in RNG.
> + * %SGX_EPC_NOT_READY: EPC is not ready for SVN update.
> + * %SGX_NO_UPDATE: EUPDATESVN was successful, but CPUSVN was not
> + * updated because current SVN was not newer than
> + * CPUSVN.
> * %SGX_UNMASKED_EVENT: An unmasked event, e.g. INTR, was received
> */
> enum sgx_return_code {
> @@ -81,6 +90,9 @@ enum sgx_return_code {
> SGX_CHILD_PRESENT = 13,
> SGX_INVALID_EINITTOKEN = 16,
> SGX_PAGE_NOT_MODIFIABLE = 20,
> + SGX_INSUFFICIENT_ENTROPY = 29,
> + SGX_EPC_NOT_READY = 30,
> + SGX_NO_UPDATE = 31,
> SGX_UNMASKED_EVENT = 128,
> };
I'd *much* rather that these mechanical constant introductions and
mindless refactoring (like reindenting) not be mixed with actual logic code.
>
> diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
> index 7f8d1e11dbee..669e44d61f9f 100644
> --- a/arch/x86/kernel/cpu/sgx/driver.c
> +++ b/arch/x86/kernel/cpu/sgx/driver.c
> @@ -19,6 +19,10 @@ static int sgx_open(struct inode *inode, struct file *file)
> struct sgx_encl *encl;
> int ret;
>
> + ret = sgx_inc_usage_count();
> + if (ret)
> + return ret;
> +
> encl = kzalloc(sizeof(*encl), GFP_KERNEL);
> if (!encl)
> return -ENOMEM;
> diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> index 279148e72459..3b54889ae4a4 100644
> --- a/arch/x86/kernel/cpu/sgx/encl.c
> +++ b/arch/x86/kernel/cpu/sgx/encl.c
> @@ -765,6 +765,7 @@ void sgx_encl_release(struct kref *ref)
> WARN_ON_ONCE(encl->secs.epc_page);
>
> kfree(encl);
> + sgx_dec_usage_count();
> }
>
> /*
> diff --git a/arch/x86/kernel/cpu/sgx/encls.h b/arch/x86/kernel/cpu/sgx/encls.h
> index 99004b02e2ed..788acf8ec563 100644
> --- a/arch/x86/kernel/cpu/sgx/encls.h
> +++ b/arch/x86/kernel/cpu/sgx/encls.h
> @@ -233,4 +233,9 @@ static inline int __eaug(struct sgx_pageinfo *pginfo, void *addr)
> return __encls_2(EAUG, pginfo, addr);
> }
>
> +/* Update CPUSVN at runtime. */
> +static inline int __eupdatesvn(void)
> +{
> + return __encls_ret_1(EUPDATESVN, "");
> +}
> #endif /* _X86_ENCLS_H */
> diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
> index 8ce352fc72ac..2872567cd22b 100644
> --- a/arch/x86/kernel/cpu/sgx/main.c
> +++ b/arch/x86/kernel/cpu/sgx/main.c
> @@ -15,6 +15,7 @@
> #include <linux/sysfs.h>
> #include <linux/vmalloc.h>
> #include <asm/sgx.h>
> +#include <asm/archrandom.h>
> #include "driver.h"
> #include "encl.h"
> #include "encls.h"
> @@ -914,6 +915,73 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
> }
> EXPORT_SYMBOL_GPL(sgx_set_attribute);
>
> +static bool sgx_has_eupdatesvn;
We have CPUID "caches" of sorts. Why open code this?
> +static atomic_t sgx_usage_count;
Is 32 bits enough for sgx_usage_count? What goes boom when it overflows?
> +static DEFINE_MUTEX(sgx_svn_lock);
What does this lock protect?
> +/**
> + * sgx_updatesvn() - Issue ENCLS[EUPDATESVN]
> + * If EPC is empty, this instruction will update CPUSVN to the currently
> + * loaded microcode update SVN and generate new cryptographic assets.
This is *NOT* EUPDATESVN. "this instruction" is not what's happening here.
sgx_updatesvn() _tries_ to update the SVN. Most of the time, there will
be no update and that's OK. This should only be called with EPC is empty.
> + * Return:
> + * 0: Success or not supported
> + * errno on error
I'm not a big fan of filling thing out just because.
What value is there in saying "errno on error"?
> + */
> +static int sgx_update_svn(void)
> +{
> + int retry = RDRAND_RETRY_LOOPS;
> + int ret;
> +
> + if (!sgx_has_eupdatesvn)
> + return 0;
This looks goofy. Why is it OK to just silently ignore an update
request? (I know the answer, but it needs to be obvious)
> + do {
> + ret = __eupdatesvn();
> + } while (ret == SGX_INSUFFICIENT_ENTROPY && --retry);
for (i = 0; i < RDRAND_RETRY_LOOPS; i++) {
ret = __eupdatesvn();
/* Stop on success or unexpected errors: */
if (ret != SGX_INSUFFICIENT_ENTROPY)
break;
}
> + if (!ret || ret == SGX_NO_UPDATE) {
> + /*
> + * SVN successfully updated, or it was already up-to-date.
> + * Let users know when the update was successful.
> + */
> + if (!ret)
> + pr_info("SVN updated successfully\n");
> + return 0;
> + }
Isn't this a lot simpler?
if (ret == SGX_NO_UPDATE)
return 0;
if (!ret) {
pr_info("SVN updated successfully\n");
return 0;
}
> + /*
> + * EUPDATESVN was called when EPC is empty, all other error
> + * codes are unexcepted except running out of entropy.
^ unexpected
Spell check, please.
> + */
> + if (ret != SGX_INSUFFICIENT_ENTROPY)
> + ENCLS_WARN(ret, "EUPDATESVN");
> + return ret;
> +}
The indentation here is backwards. The error paths should be indented
and the success path at the lowest indent whenever possible. This, for
example:
if (ret == SGX_NO_UPDATE)
return 0;
if (ret) {
ENCLS_WARN(ret, "EUPDATESVN");
return ret;
}
pr_info("SVN updated successfully\n");
return 0;
Oh, and do we expect SGX_INSUFFICIENT_ENTROPY all the time? I thought it
was supposed to be horribly rare. Shouldn't we warn on it?
> +int sgx_inc_usage_count(void)
> +{
No comments, eh?
What does success _mean_? What does failure mean?
> + int ret;
> +
> + if (atomic_inc_not_zero(&sgx_usage_count))
> + return 0;
> +
> + guard(mutex)(&sgx_svn_lock);
> +
> + if (atomic_inc_not_zero(&sgx_usage_count))
> + return 0;
> +
> + ret = sgx_update_svn();
> + if (!ret)
> + atomic_inc(&sgx_usage_count);
> + return ret;
> +}
Gah, this is 100% *NOT* obvious what's going on. Yet there are zero
comments on it. The lock is uncommented. The atomic is uncommented.
What does any of this mean? What do the components do?
> +
> +void sgx_dec_usage_count(void)
> +{
> + atomic_dec(&sgx_usage_count);
> +}
> static int __init sgx_init(void)
> {
> int ret;
> @@ -947,6 +1015,8 @@ static int __init sgx_init(void)
> if (sgx_vepc_init() && ret)
> goto err_provision;
>
> + sgx_has_eupdatesvn = (cpuid_eax(SGX_CPUID) & SGX_CPUID_EUPDATESVN);
> +
> return 0;
>
> err_provision:
> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> index d2dad21259a8..f5940393d9bd 100644
> --- a/arch/x86/kernel/cpu/sgx/sgx.h
> +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> @@ -102,6 +102,9 @@ static inline int __init sgx_vepc_init(void)
> }
> #endif
>
> +int sgx_inc_usage_count(void);
> +void sgx_dec_usage_count(void);
> +
> void sgx_update_lepubkeyhash(u64 *lepubkeyhash);
>
> #endif /* _X86_SGX_H */
> diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
> index 7aaa3652e31d..e548de340c2e 100644
> --- a/arch/x86/kernel/cpu/sgx/virt.c
> +++ b/arch/x86/kernel/cpu/sgx/virt.c
> @@ -255,12 +255,18 @@ static int sgx_vepc_release(struct inode *inode, struct file *file)
> xa_destroy(&vepc->page_array);
> kfree(vepc);
>
> + sgx_dec_usage_count();
> return 0;
> }
>
> static int sgx_vepc_open(struct inode *inode, struct file *file)
> {
> struct sgx_vepc *vepc;
> + int ret;
> +
> + ret = sgx_inc_usage_count();
> + if (ret)
> + return ret;
>
> vepc = kzalloc(sizeof(struct sgx_vepc), GFP_KERNEL);
> if (!vepc)
I think I'd do this in at least 4 patches:
1. Introduce the usage count tracking: the atomic and the open/release
"hooks", maybe without error handling on the open() side
2. Introduce the EUPDATESVN mechanical bits: the CPUID bit, the
enumeration, the bool, the new error enum values
3. Introduce the mechanical eupdatesvn function. The retry loop and the
"no entropy" handling
4. Plumb #3 into #1
#4 is your place to argue if EUPDATESVN failures should cascade to open().
Powered by blists - more mailing lists