[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68fe92d8eef5f_10e210057@dwillia2-mobl4.notmuch>
Date: Sun, 26 Oct 2025 14:30:00 -0700
From: <dan.j.williams@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>, <dan.j.williams@...el.com>
CC: Dave Hansen <dave.hansen@...el.com>, Chao Gao <chao.gao@...el.com>,
"Reshetova, Elena" <elena.reshetova@...el.com>, "linux-coco@...ts.linux.dev"
<linux-coco@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>, "Chatre,
Reinette" <reinette.chatre@...el.com>, "Weiny, Ira" <ira.weiny@...el.com>,
"Huang, Kai" <kai.huang@...el.com>, "yilun.xu@...ux.intel.com"
<yilun.xu@...ux.intel.com>, "sagis@...gle.com" <sagis@...gle.com>,
"paulmck@...nel.org" <paulmck@...nel.org>, "nik.borisov@...e.com"
<nik.borisov@...e.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar
<mingo@...hat.com>, "Kirill A. Shutemov" <kas@...nel.org>, Paolo Bonzini
<pbonzini@...hat.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 00/21] Runtime TDX Module update support
Vishal Annapurve wrote:
> On Fri, Oct 24, 2025 at 6:42 PM <dan.j.williams@...el.com> wrote:
> >
> > Vishal Annapurve wrote:
> > > On Fri, Oct 24, 2025 at 2:19 PM Dave Hansen <dave.hansen@...el.com> wrote:
> > > >
> > > > On 10/24/25 14:12, dan.j.williams@...el.com wrote:
> > > > >> The SGX solution, btw, was to at least ensure forward progress (CPUSVN
> > > > >> update) when the last enclave goes away. So new enclaves aren't
> > > > >> *prevented* from starting but the window when the first one starts
> > > > >> (enclave count going from 0->1) is leveraged to do the update.
> > > > > The status quo does ensure forward progress. The TD does get built and
> > > > > the update does complete, just the small matter of TD attestation
> > > > > failures, right?
> > >
> > > I would think that it's not a "small" problem if confidential
> > > workloads on the hosts are not able to pass attestation.
> >
> > "Small" as in "not the kernel's problem". Userspace asked for the
> > update, update is documented to clobber build sometimes, userspace ran
> > an update anyway. Userspace asked for the clobber.
> >
> > It would be lovely if this clobbering does not happen at all and the
> > update mechanism did not come with this misfeature. Otherwise, the kernel
> > has no interface to solve that problem. The best it can do is document
> > that this new update facility has this side effect.
>
> In this case, host kernel has a way to ensure that userspace can't
> trigger such clobbering at all.
Unless the clobber condition can be made atomic with respect to update
so that both succeed, the kernel needs to punt the syncrhonization
problem to userspace.
A theoretical TDX Module change could ensure that atomicity. A
theoretical change to the kernel's build ABI could effect that as well,
or notify the collision. I.e. a flag at the finalization stage that an
update happened during the build sequence needs a restart. This is the
role of "generation" in the tsm_report ABI. As far as I understand
userspace just skips that ABI and arranges for userspace synchronized
access to tsm_report.
At the point where the solution is "change existing build flows" might
as well just have userspace wrap the flows with userspace exclusion.
> That IIUC is "Avoid updates during update sensitive times". Best
> kernel can do is prevent userspace from screwing up the state of TDs.
"Avoid updates during update sensitive times" is the documentation for
the update userspace ABI.
> > Userspace always has the choice to not update, coordinate update with
> > build, or do nothing and let tenants try to launch again. Userspace
> > could even retry the build and hide the tenant failure if it knew about
> > the clobber,
>
> IIUC host userspace has no way to know if the TD state got clobbered.
Correct, today it can only assume that both flows need to be mutually
exclusive.
> > but be clear that the problem is the clobber not the kernel
> > doing what userspace asked.
> >
> > The clobber, as I understand, is also limited to cases where the update
> > includes crypto library changes. I am not sure how often that happens in
> > practice. Suffice to say, the fact that the clobber is conditioned on
> > the contents of the update also puts it further away from being a kernel
>
> The knowledge of things getting clobbered are well much further away
> from userspace.
The possibility is documented as part of the update ABI. Another
documentation possibility is that updates that change the crypto library
are by definition not "runtime update" capable. A possible TDX Module
change to remove this collision. A menu of options before complicating
the kernel.
Powered by blists - more mailing lists