[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAwFhaqQDLXoqbmv@google.com>
Date: Fri, 25 Apr 2025 14:58:29 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Elena Reshetova <elena.reshetova@...el.com>, "jarkko@...nel.org" <jarkko@...nel.org>,
Kai Huang <kai.huang@...el.com>,
"linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>, Vincent Scarlata <vincent.r.scarlata@...el.com>,
"x86@...nel.org" <x86@...nel.org>, Vishal Annapurve <vannapurve@...gle.com>, Chong Cai <chongc@...gle.com>,
Asit K Mallick <asit.k.mallick@...el.com>, Erdem Aktas <erdemaktas@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "bondarn@...gle.com" <bondarn@...gle.com>,
"dionnaglaze@...gle.com" <dionnaglaze@...gle.com>, Scott Raynor <scott.raynor@...el.com>
Subject: Re: [PATCH v3 2/2] x86/sgx: Implement EUPDATESVN and
opportunistically call it during first EPC page alloc
On Fri, Apr 25, 2025, Dave Hansen wrote:
> On 4/25/25 14:04, Sean Christopherson wrote:
> > Userspace is going to be waiting on ->release() no matter what.
>
> Unless it isn't even involved and it happens automatically.
With my Google hat on: no thanks.
Customer: Hey Google, why haven't you applied security update XYZ?
Support: We have.
Customer: The SVN in my attestation report says otherwise.
Support: Let me check with engineering.
TDX team: We applied the ucode update provided by platforms. Platforms, what's up?
Platforms: That's the right ucode patch.
TDX team: Hmm, the kernel is supposed to update the SVN. Let's bug the kernel team.
Me: Have you guaranteed there are no active enclaves after the update?
TDX team: Yep.
Me: <tries to debug the problem, but it's in prod and only happens on
some platforms>
Me: Our theory is that enclaves haven't been fully destroyed when the
hold is lifted. Try adding a delay? Maybe 1s?
TDX team: That helped, but we still have intermittent failures.
Me: How about 5 seconds?
TDX team: Great, that worked!
Support: Sorry for the delay, we're rolling out a fix, you should see the correct
SVN shortly.
<time passes>
Customer: Hey Google, my TDX VMs are stalled for 5 seconds during boot.
Support: Let me check with engineering...
Is that likely to happen? No. Is a delay of multiple seconds likely? Also no.
But it's not that far fetched. And if something does go sideways, e.g. an EPC
page gets leaked, or enclave FD gets orphaned and left opened, etc., then I would
much, much prefer that the issue be visible to userspace. Things going sideways
is inevitable; being able to take action when badness happens makes a world of
difference.
Coupled with adding latency to launching the 0=>1 enclave, just to handle something
that happens a few times per year, and I don't see any value in automatic updates.
Maybe it sounds nice on paper, but from my perspective, I see nothing but pain.
Powered by blists - more mailing lists