[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68fc2af6305be_10e210029@dwillia2-mobl4.notmuch>
Date: Fri, 24 Oct 2025 18:42:14 -0700
From: <dan.j.williams@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>, Dave Hansen
<dave.hansen@...el.com>
CC: <dan.j.williams@...el.com>, Chao Gao <chao.gao@...el.com>, "Reshetova,
Elena" <elena.reshetova@...el.com>, "linux-coco@...ts.linux.dev"
<linux-coco@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>, "Chatre,
Reinette" <reinette.chatre@...el.com>, "Weiny, Ira" <ira.weiny@...el.com>,
"Huang, Kai" <kai.huang@...el.com>, "yilun.xu@...ux.intel.com"
<yilun.xu@...ux.intel.com>, "sagis@...gle.com" <sagis@...gle.com>,
"paulmck@...nel.org" <paulmck@...nel.org>, "nik.borisov@...e.com"
<nik.borisov@...e.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar
<mingo@...hat.com>, "Kirill A. Shutemov" <kas@...nel.org>, Paolo Bonzini
<pbonzini@...hat.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 00/21] Runtime TDX Module update support
Vishal Annapurve wrote:
> On Fri, Oct 24, 2025 at 2:19 PM Dave Hansen <dave.hansen@...el.com> wrote:
> >
> > On 10/24/25 14:12, dan.j.williams@...el.com wrote:
> > >> The SGX solution, btw, was to at least ensure forward progress (CPUSVN
> > >> update) when the last enclave goes away. So new enclaves aren't
> > >> *prevented* from starting but the window when the first one starts
> > >> (enclave count going from 0->1) is leveraged to do the update.
> > > The status quo does ensure forward progress. The TD does get built and
> > > the update does complete, just the small matter of TD attestation
> > > failures, right?
>
> I would think that it's not a "small" problem if confidential
> workloads on the hosts are not able to pass attestation.
"Small" as in "not the kernel's problem". Userspace asked for the
update, update is documented to clobber build sometimes, userspace ran
an update anyway. Userspace asked for the clobber.
It would be lovely if this clobbering does not happen at all and the
update mechanism did not come with this misfeature. Otherwise, the kernel
has no interface to solve that problem. The best it can do is document
that this new update facility has this side effect.
Userspace always has the choice to not update, coordinate update with
build, or do nothing and let tenants try to launch again. Userspace
could even retry the build and hide the tenant failure if it knew about
the clobber, but be clear that the problem is the clobber not the kernel
doing what userspace asked.
The clobber, as I understand, is also limited to cases where the update
includes crypto library changes. I am not sure how often that happens in
practice. Suffice to say, the fact that the clobber is conditioned on
the contents of the update also puts it further away from being a kernel
problem. The clobber does not corrupt kernel state.
> > Oh, yeah, for sure.
> >
> > If we do _nothing_ in the kernel (no build vs. module update
> > synchronization), then the downside is being exposed to attestation
> > failures if userspace either also does nothing or has bugs.
> >
> > That's actually, by far, my preferred solution to this whole mess:
> > Userspace plays stupid games, userspace wins stupid prizes.
> >
>
> IIUC, enforcing "Avoid updates during update sensitive times" is not
> that complex and will ensure to avoid any issues with user space
> logic.
Userspace logic avoids issues by honoring the documentation that these
ABIs sequences need synchronization. Otherwise, kernel blocking update
during build just trades one error for another.
Treat this like any other userspace solution for requiring "atomic"
semantics when the kernel mechanisms are not themselves designed to be
atomic, wrap it in userspace synchronization.
Powered by blists - more mailing lists