[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <21752778-b6c8-42aa-8be6-db93a30a570c@intel.com>
Date: Thu, 8 May 2025 08:45:39 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: "Reshetova, Elena" <elena.reshetova@...el.com>
Cc: "jarkko@...nel.org" <jarkko@...nel.org>,
"seanjc@...gle.com" <seanjc@...gle.com>, "Huang, Kai" <kai.huang@...el.com>,
"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>, "Mallick, Asit K"
<asit.k.mallick@...el.com>,
"Scarlata, Vincent R" <vincent.r.scarlata@...el.com>,
"Cai, Chong" <chongc@...gle.com>, "Aktas, Erdem" <erdemaktas@...gle.com>,
"Annapurve, Vishal" <vannapurve@...gle.com>,
"dionnaglaze@...gle.com" <dionnaglaze@...gle.com>,
"bondarn@...gle.com" <bondarn@...gle.com>,
"Raynor, Scott" <scott.raynor@...el.com>
Subject: Re: [PATCH v4 1/1] x86/sgx: Enable automatic SVN updates for SGX
enclaves
On 5/8/25 07:07, Reshetova, Elena wrote:
...
>> Actually, I think I wrote changelogs for this once upon a time. Could
>> you please go dig those up and use them?
>
> Could you please suggest where can I find them? Was it for the previous
> submission done by Cathy?
Yes. There were also some long discussions on an Intel internal mailing
list. I'll send you the details.
>>> https://cdrdv2.intel.com/v1/dl/getContent/648682?explicitVersion=true
>>
>> These become stale almost immediately. Please also cite the document title.
>
> Sure, I can add the title, but I dont have another link.
Right, Intel doesn't provide long-term stable links. Also, the
"explicitVersion" should be able to be removed.
>> Gah, how did this get squished back down to a single patch? It was
>> multiple patches before. There are multiple logical things going on here
>> and they need to be broken out.
>
> Yes, but the code actually did get smaller, so it didn’t feel
> like it is worth breaking this into different patches.
> But I will follow your suggestion.
It's not really about the size. It's about making it as reviewable as
possible and minimizing the time a reviewer needs to spend on it. Even
for small things.
>>> +static bool sgx_has_eupdatesvn;
>>
>> We have CPUID "caches" of sorts. Why open code this?
>
> You mean X86_FEATURE_*?
Yes.
> SGX CPUID is only defined in SGX code currently (btw, I am not sure
> why they are made special) so it doesn’t support this.
It's only used in SGX code and each subleaf is only used once, at init
time. There's really no reason to add any caching of any parts of it for
a single init-time use.
> Is this a past decision that you want to change?
*This* patch changes things. It moves the one user from a single
init-time use to a runtime user. It adds a global variable. It's not
about *me* wanting change. It's about this patch changing the status quo.
I don't really care about the #define. That's not the question. The
question is whether we want the a scattered X86_FEATURE bit for
'sgx_has_eupdatesvn' or not.
>>> +static atomic_t sgx_usage_count;
>>
>> Is 32 bits enough for sgx_usage_count? What goes boom when it overflows?
>
> I can increase the value easily, but even with current number we are talking
> about 2^32 enclaves, i dont think this is realistic to happen in practice.
I don't mean to be pedantic, but is it 2^32? I thought we had signed
ints underneath atomic_t and we allow negative atomic_t values. Are you
suggesting that since we're just using a counter that the overflows into
the negative space are OK and it doesn't matter until it wraps all the
way back around to 0?
Assuming that you actually mean 2^32... It is it about 2^32 enclaves? Or
2^32 *opens*? Those are very, very different things.
Also, this is kinda the second question that I asked that hasn't really
been answered directly. I really asked it for a good reason, and I'd
really prefer that it be answered, even if you disagree with the premise.
I really want to know what goes boom when it overflows, so I'll go into
detail why it matters.
If it's just that an extra EUPDATESVN gets run and some random SGX user
might see a random #GP, that's not so bad. But if it's kernel crash or
use-after-free, that _is_ bad.
Is 4 bytes (maybe even 0 with the alignment of other things in practice)
worth it to take the problem from "not realistic" to "utterly impossible".
sizeof(struct file) == 240, so I guess it would take almost 1TB of
'struct file' to overflow the counter, plus the overhead of 2^32 'struct
sgx_encl's. I'd call that "stupid" but not impossible on a bit modern
machine.
But with a atomic64_t, you literally can't overflow it with 'struct
file' in a 64-bit address space because the address space fills up
before the counter overflows. It's in "utterly impossible" territory.
I'll admit, I was rushing through the review and didn't spell all of
that out my first pass around. Yeah, it's a little bit overkill to
explain this for the type of one variable. But, I'm hoping that with all
of that logic laid out, you will consider answering my question this time.
I'd also very much appreciate if you could directly answer my questions
in future reviews, even if they appear silly. It'll make this all go faster.
...
>> if (ret == SGX_NO_UPDATE)
>> return 0;
>>
>> if (ret) {
>> ENCLS_WARN(ret, "EUPDATESVN");
>> return ret;
>> }
>>
>> pr_info("SVN updated successfully\n");
>> return 0;
>>
>> Oh, and do we expect SGX_INSUFFICIENT_ENTROPY all the time? I thought it
>> was supposed to be horribly rare. Shouldn't we warn on it?
>
> The entropy comes from RDSEED in this case, not RDRAND.
> This is not a horribly rare but very realistic failure.
Fair enough. If you want to keep it this way, could you run an
experiment and see how realistic it is? Surely, if it's very realistic,
you should be able to show it on real hardware quite easily.
We had a loooooooong discussion about this not so long ago. I believe
older CPUs were much more likely to be able to observe RDSEED failures.
But, even on those CPUs, repeated failures were pretty darn rare. New
CPUs are less likely to observe a single RDSEED failures. They are very
unlikely to see repeated failures.
So really want to know if this is needed in _practice_. Sure,
theoretically, the architecture allows CPUs to fail RDSEED all the time.
But do the CPUs with EUPDATESVN also have RDSEED implementations that
fail easily?
Powered by blists - more mailing lists