linux-kernel - RE: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX enclaves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM8PR11MB5750FA8798AF68845D81717FE797A@DM8PR11MB5750.namprd11.prod.outlook.com>
Date: Mon, 12 May 2025 12:54:15 +0000
From: "Reshetova, Elena" <elena.reshetova@...el.com>
To: "Hansen, Dave" <dave.hansen@...el.com>
CC: "jarkko@...nel.org" <jarkko@...nel.org>, "seanjc@...gle.com"
	<seanjc@...gle.com>, "Huang, Kai" <kai.huang@...el.com>,
	"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"x86@...nel.org" <x86@...nel.org>, "Mallick, Asit K"
	<asit.k.mallick@...el.com>, "Scarlata, Vincent R"
	<vincent.r.scarlata@...el.com>, "Cai, Chong" <chongc@...gle.com>, "Aktas,
 Erdem" <erdemaktas@...gle.com>, "Annapurve, Vishal" <vannapurve@...gle.com>,
	"dionnaglaze@...gle.com" <dionnaglaze@...gle.com>, "bondarn@...gle.com"
	<bondarn@...gle.com>, "Raynor, Scott" <scott.raynor@...el.com>
Subject: RE: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX
 enclaves

> >>> +static bool sgx_has_eupdatesvn;
> >>
> >> We have CPUID "caches" of sorts. Why open code this?
> >
> > You mean X86_FEATURE_*?
> 
> Yes.
> 
> > SGX CPUID is only defined in SGX code currently (btw, I am not sure
> > why they are made special) so it doesn’t support this.
> 
> It's only used in SGX code and each subleaf is only used once, at init
> time. There's really no reason to add any caching of any parts of it for
> a single init-time use.
> 
> > Is this a past decision that you want to change?
> 
> *This* patch changes things. It moves the one user from a single
> init-time use to a runtime user. It adds a global variable. It's not
> about *me* wanting change. It's about this patch changing the status quo.
> 
> I don't really care about the #define. That's not the question. The
> question is whether we want the a scattered X86_FEATURE bit for
> 'sgx_has_eupdatesvn' or not.

I don't have a strong opinion about this.
What is your suggestion here? 

> 
> >>> +static atomic_t sgx_usage_count;
> >>
> >> Is 32 bits enough for sgx_usage_count? What goes boom when it
> overflows?
> >
> > I can increase the value easily, but even with current number we are talking
> > about 2^32 enclaves, i dont think this is realistic to happen in practice.
> 
> I don't mean to be pedantic, but is it 2^32? I thought we had signed
> ints underneath atomic_t and we allow negative atomic_t values. Are you
> suggesting that since we're just using a counter that the overflows into
> the negative space are OK and it doesn't matter until it wraps all the
> way back around to 0?

Yes, you would think that wrapping to negative is ok in this case as soon as
dont end up hitting 0. But when it does happen, we will think that EPC is empty
and attempt to execute EUPDATESVN. In this case, eupdatevsn fails with
SGX_EPC_NOT_READY or SGX_LOCKFAIL errors depending on the state
of the system, see the microcode flow: 

IF (Other instruction is accessing EPC) THEN
 RFLAGS.ZF := 1
 RAX := SGX_LOCKFAIL;
 GOTO ERROR_EXIT;
FI
(* Verify EPC is ready *)
IF (the EPC contains any valid pages) THEN
 RFLAGS.ZF := 1;
 RAX := SGX_EPC_NOT_READY;
 GOTO ERROR_EXIT;
FI

This is ok on its own, but in the case any *other* ENCLS
instruction is executing in parallel during this, it might #GP,
which is going to be unexpected behaviour for legacy SW
and smth we would like to prevent. So, I would state that
this counter must be designed against possible overflows
because of this.

> 
> Assuming that you actually mean 2^32... It is it about 2^32 enclaves? Or
> 2^32 *opens*? Those are very, very different things.

We will increment counter on every open, so 2^32 opens at the same time,
because each release drops a counter. 

> 
> Also, this is kinda the second question that I asked that hasn't really
> been answered directly. I really asked it for a good reason, and I'd
> really prefer that it be answered, even if you disagree with the premise.
> 
> I really want to know what goes boom when it overflows, so I'll go into
> detail why it matters.
> 
> If it's just that an extra EUPDATESVN gets run and some random SGX user
> might see a random #GP, that's not so bad. But if it's kernel crash or
> use-after-free, that _is_ bad.

Well, I would still say that a random legacy SGX user getting a #GP is not a good
behaviour also, but yes, this is the worst case. 

> 
> Is 4 bytes (maybe even 0 with the alignment of other things in practice)
> worth it to take the problem from "not realistic" to "utterly  impossible".
> 
> sizeof(struct file) == 240, so I guess it would take almost 1TB of
> 'struct file' to overflow the counter, plus the overhead of 2^32 'struct
> sgx_encl's. I'd call that "stupid" but not impossible on a bit modern
> machine.
> 
> But with a atomic64_t, you literally can't overflow it with 'struct
> file' in a 64-bit address space because the address space fills up
> before the counter overflows. It's in "utterly impossible" territory.

Yes, so I will change it to atomic64_t. 

> 
> I'll admit, I was rushing through the review and didn't spell all of
> that out my first pass around. Yeah, it's a little bit overkill to
> explain this for the type of one variable. But, I'm hoping that with all
> of that logic laid out, you will consider answering my question this time.
> 
> I'd also very much appreciate if you could directly answer my questions
> in future reviews, even if they appear silly. It'll make this all go faster.

Yes, of course, I am sorry not to answer to this point properly before.
Hopefully you have an answer now. 

> 
> ...
> >> 	if (ret == SGX_NO_UPDATE)
> >> 		return 0;
> >>
> >> 	if (ret) {
> >> 		ENCLS_WARN(ret, "EUPDATESVN");
> >> 		return ret;
> >> 	}
> >>
> >> 	pr_info("SVN updated successfully\n");
> >> 	return 0;
> >>
> >> Oh, and do we expect SGX_INSUFFICIENT_ENTROPY all the time? I thought
> it
> >> was supposed to be horribly rare. Shouldn't we warn on it?
> >
> > The entropy comes from RDSEED in this case, not RDRAND.
> > This is not a horribly rare but very realistic failure.
> 
> Fair enough. If you want to keep it this way, could you run an
> experiment and see how realistic it is? Surely, if it's very realistic,
> you should be able to show it on real hardware quite easily.
> 
> We had a loooooooong discussion about this not so long ago. I believe
> older CPUs were much more likely to be able to observe RDSEED failures.
> But, even on those CPUs, repeated failures were pretty darn rare. New
> CPUs are less likely to observe a single RDSEED failures. They are very
> unlikely to see repeated failures.
> 
> So really want to know if this is needed in _practice_. Sure,
> theoretically, the architecture allows CPUs to fail RDSEED all the time.
> But do the CPUs with EUPDATESVN also have RDSEED implementations that
> fail easily?

This was the recent discussion I am aware we had on this matter:
https://lkml.org/lkml/2024/2/5/1595
The measurements were done for older platform (skylake), but I am not
aware of any architectural changes since that time to improve this. 


Best Regards,
Elena.