[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGtprH9pBFENEOs7fcu-UMwU6Eiygw3h8L_Yxvc5S4mNsZvPxA@mail.gmail.com>
Date: Sat, 25 Oct 2025 05:01:26 -0700
From: Vishal Annapurve <vannapurve@...gle.com>
To: dan.j.williams@...el.com
Cc: Dave Hansen <dave.hansen@...el.com>, Chao Gao <chao.gao@...el.com>,
"Reshetova, Elena" <elena.reshetova@...el.com>,
"linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>,
"Chatre, Reinette" <reinette.chatre@...el.com>, "Weiny, Ira" <ira.weiny@...el.com>,
"Huang, Kai" <kai.huang@...el.com>, "yilun.xu@...ux.intel.com" <yilun.xu@...ux.intel.com>,
"sagis@...gle.com" <sagis@...gle.com>, "paulmck@...nel.org" <paulmck@...nel.org>,
"nik.borisov@...e.com" <nik.borisov@...e.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>,
Ingo Molnar <mingo@...hat.com>, "Kirill A. Shutemov" <kas@...nel.org>, Paolo Bonzini <pbonzini@...hat.com>,
"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 00/21] Runtime TDX Module update support
On Sat, Oct 25, 2025 at 4:55 AM Vishal Annapurve <vannapurve@...gle.com> wrote:
>
> On Fri, Oct 24, 2025 at 6:42 PM <dan.j.williams@...el.com> wrote:
> >
> > Vishal Annapurve wrote:
> > > On Fri, Oct 24, 2025 at 2:19 PM Dave Hansen <dave.hansen@...el.com> wrote:
> > > >
> > > > On 10/24/25 14:12, dan.j.williams@...el.com wrote:
> > > > >> The SGX solution, btw, was to at least ensure forward progress (CPUSVN
> > > > >> update) when the last enclave goes away. So new enclaves aren't
> > > > >> *prevented* from starting but the window when the first one starts
> > > > >> (enclave count going from 0->1) is leveraged to do the update.
> > > > > The status quo does ensure forward progress. The TD does get built and
> > > > > the update does complete, just the small matter of TD attestation
> > > > > failures, right?
> > >
> > > I would think that it's not a "small" problem if confidential
> > > workloads on the hosts are not able to pass attestation.
> >
> > "Small" as in "not the kernel's problem". Userspace asked for the
> > update, update is documented to clobber build sometimes, userspace ran
> > an update anyway. Userspace asked for the clobber.
> >
> > It would be lovely if this clobbering does not happen at all and the
> > update mechanism did not come with this misfeature. Otherwise, the kernel
> > has no interface to solve that problem. The best it can do is document
> > that this new update facility has this side effect.
>
> In this case, host kernel has a way to ensure that userspace can't
> trigger such clobbering at all. That IIUC is "Avoid updates during
> update sensitive times". Best kernel can do is prevent userspace from
> screwing up the state of TDs.
>
> >
> > Userspace always has the choice to not update, coordinate update with
> > build, or do nothing and let tenants try to launch again. Userspace
> > could even retry the build and hide the tenant failure if it knew about
> > the clobber,
>
> IIUC host userspace has no way to know if the TD state got clobbered.
>
> > but be clear that the problem is the clobber not the kernel
> > doing what userspace asked.
> >
> > The clobber, as I understand, is also limited to cases where the update
> > includes crypto library changes. I am not sure how often that happens in
> > practice. Suffice to say, the fact that the clobber is conditioned on
> > the contents of the update also puts it further away from being a kernel
>
> The knowledge of things getting clobbered are well much further away
> from userspace.
>
> > problem. The clobber does not corrupt kernel state.
> >
> > > > Oh, yeah, for sure.
> > > >
> > > > If we do _nothing_ in the kernel (no build vs. module update
> > > > synchronization), then the downside is being exposed to attestation
> > > > failures if userspace either also does nothing or has bugs.
> > > >
> > > > That's actually, by far, my preferred solution to this whole mess:
> > > > Userspace plays stupid games, userspace wins stupid prizes.
> > > >
> > >
> > > IIUC, enforcing "Avoid updates during update sensitive times" is not
> > > that complex and will ensure to avoid any issues with user space
> > > logic.
> >
> > Userspace logic avoids issues by honoring the documentation that these
> > ABIs sequences need synchronization. Otherwise, kernel blocking update
> > during build just trades one error for another.
>
> Kernel blocking update during build makes the production systems much
> safer and prevents userspace from screwing up the state that it has no
> way to detect after the fact.
>
> >
> > Treat this like any other userspace solution for requiring "atomic"
> > semantics when the kernel mechanisms are not themselves designed to be
> > atomic, wrap it in userspace synchronization.
>
> In general if this is something userspace detectable I would agree,
> TDX module is the closest entity that can detect the problematic
> sequence and the host kernel has a very simple way to ensure that such
> a problematic sequence is not at all allowed to happen by toggling
> some seamcall controls. It would be very helpful IMO to ensure that
> userspace is not able to screw up production workloads especially if
> the mess is not all visible to userspace.
Detecting is one thing, undoing the mess is disruptive and not easy to
orchestrate in this case.
Powered by blists - more mailing lists