linux-kernel - Re: [PATCH v2 00/21] Runtime TDX Module update support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aPsuD2fbYwCccgNi@intel.com>
Date: Fri, 24 Oct 2025 15:43:11 +0800
From: Chao Gao <chao.gao@...el.com>
To: Dave Hansen <dave.hansen@...el.com>
CC: Vishal Annapurve <vannapurve@...gle.com>, "Reshetova, Elena"
	<elena.reshetova@...el.com>, "linux-coco@...ts.linux.dev"
	<linux-coco@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "x86@...nel.org" <x86@...nel.org>, "Chatre,
 Reinette" <reinette.chatre@...el.com>, "Weiny, Ira" <ira.weiny@...el.com>,
	"Huang, Kai" <kai.huang@...el.com>, "Williams, Dan J"
	<dan.j.williams@...el.com>, "yilun.xu@...ux.intel.com"
	<yilun.xu@...ux.intel.com>, "sagis@...gle.com" <sagis@...gle.com>,
	"paulmck@...nel.org" <paulmck@...nel.org>, "nik.borisov@...e.com"
	<nik.borisov@...e.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
	<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar
	<mingo@...hat.com>, "Kirill A. Shutemov" <kas@...nel.org>, Paolo Bonzini
	<pbonzini@...hat.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 00/21] Runtime TDX Module update support

>One thing I don't think I've heard anyone be worried about is how timely
>the update process is. So how about this: Updates wait for any existing
>builds to complete. But, new builds wait for updates. That can be done
>with a single rwsem:
>
>struct rw_semaphore update_rwsem;
>
>tdx_td_init()
>{
>	...
>+	down_read_interruptible(&update_rwsem);
>	kvm_tdx->state = TD_STATE_INITIALIZED;
>
>tdx_td_finalize()
>{
>	...
>+	up_read(&update_rwsem);
>	kvm_tdx->state = TD_STATE_RUNNABLE;
>
>A module update does:
>
>	down_write_interruptible(&update_rwsem);
>	do_actual_update();
>	up_write(&update_rwsem);
>
>There would be no corruption issues, no erroring out of the build
>process, and no punting to userspace to ensure forward progress.
>
>The big downside is that both the build process and update process can
>appear to hang for a long time. It'll also be a bit annoying to ensure
>that there are up_read(&update_rwsem)'s if the kvm_tdx object gets torn
>down during a build.
>
>But the massive upside is that there's no new ABI and all the
>consistency and forward progress guarantees are in the kernel. If we
>want new ABIs around it that give O_NONBLOCK semantics to build or
>update, that can be added on after the fact.
>
>Plus, if userspace *WANTS* to coordinate the whole shebang, they're free
>to. They'd never see long hangs because they would be coordinating.
>
>Thoughts?

Hi Dave,

Thanks for this summary and suggestion.

Beyond "the kvm_tdx object gets torn down during a build," I see two potential
issues:

1. TD Build and TDX migration aren't purely kernel processes -- they span multiple
   KVM ioctls. Holding a read-write lock throughout the entire process would
   require exiting to userspace while the lock is held. I think this is
   irregular, but I'm not sure if it's acceptable for read-write semaphores.

2. The kernel may need to hold this read-write lock for operations not yet
   defined in the future. The TDX Module Base spec [*] notes on page 55:

   : Future TDX Module versions may have different or additional update-sensitive
   : cases. By design, such cases apply to a small portion of the overall TD
   : lifecycle.

[*]: https://cdrdv2.intel.com/v1/dl/getContent/733575

Given these concerns, I'm not sure whether implementing a read-write lock in
the kernel is the right approach.

Since Google prefers to "avoid updates during update-sensitive times," we can
implement that approach for now. If other Linux users find this insufficient
and prefer failing TD build/migration operations with strong justification, we
can enable that functionality in the future.

What do you think?