linux-kernel - Re: [PATCH v2 00/21] Runtime TDX Module update support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2e49e80f-fab0-4248-8dae-76543e3c6ae3@intel.com>
Date: Fri, 24 Oct 2025 13:13:40 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: dan.j.williams@...el.com, Chao Gao <chao.gao@...el.com>
Cc: Vishal Annapurve <vannapurve@...gle.com>,
 "Reshetova, Elena" <elena.reshetova@...el.com>,
 "linux-coco@...ts.linux.dev" <linux-coco@...ts.linux.dev>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "x86@...nel.org" <x86@...nel.org>,
 "Chatre, Reinette" <reinette.chatre@...el.com>,
 "Weiny, Ira" <ira.weiny@...el.com>, "Huang, Kai" <kai.huang@...el.com>,
 "yilun.xu@...ux.intel.com" <yilun.xu@...ux.intel.com>,
 "sagis@...gle.com" <sagis@...gle.com>,
 "paulmck@...nel.org" <paulmck@...nel.org>,
 "nik.borisov@...e.com" <nik.borisov@...e.com>, Borislav Petkov
 <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
 "Kirill A. Shutemov" <kas@...nel.org>, Paolo Bonzini <pbonzini@...hat.com>,
 "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
 Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v2 00/21] Runtime TDX Module update support

On 10/24/25 12:40, dan.j.williams@...el.com wrote:
> Dave Hansen wrote:
>> On 10/24/25 00:43, Chao Gao wrote:
>> ...
>>> Beyond "the kvm_tdx object gets torn down during a build," I see two potential
>>> issues:
>>>
>>> 1. TD Build and TDX migration aren't purely kernel processes -- they span multiple
>>>    KVM ioctls. Holding a read-write lock throughout the entire process would
>>>    require exiting to userspace while the lock is held. I think this is
>>>    irregular, but I'm not sure if it's acceptable for read-write semaphores.
>>
>> Sure, I guess it's irregular. But look at it this way: let's say we
>> concocted some scheme to use a TD build refcount and a module update
>> flag, had them both wait_event_interruptible() on each other, and then
>> did wakeups. That would get the same semantics without an rwsem.
> 
> This sounds unworkable to me.
> 
> First, you cannot return to userspace while holding a lock. Lockdep will
> rightfully scream:
> 
>     "WARNING: lock held when returning to user space!"

Well, yup, it sure does look that way for normal lockdep-annotated lock
types. It does seem like a sane rule to have for most things.

But, just to be clear, this is a lockdep thing and a good, solid
semantic to have. It's not a rule that no kernel locking structure can
ever be held when returning to userspace.

> The complexity of ensuring that a multi-stage ABI transaction completes
> from the kernel side is painful. If that process dies in the middle of
> its ABI sequence who cleans up these references?

The 'struct kvm_tdx' has to get destroyed at some point. It also has a
'kvm_tdx_state' field that could be tied very tightly to the build
status. The reference gets cleaned up before the point when the
kvm_tdx->state memory is freed.

> The operational mechanism to make sure that one process flow does not
> mess up another process flow is for those process to communicate with
> *userspace* file locks, or for those process to check for failures after
> the fact and retry. Unless you can make the build side an atomic ABI,
> this is a documentation + userspace problem, not a kernel problem.

Yeah, that's a totally valid take on it.

My only worry is that the module update is going to be off in another
world from the thing building TDs. We had a similar set of challenges
around microcode updates, CPUSVN and SGX enclaves.

The guy doing "echo 1 > /sys/.../whatever" wasn't coordinating with
every entity on the system that might run an SGX enclave. It certainly
didn't help that enclave creation is typically done by unprivileged
users. Maybe the KVM/TDX world is a _bit_ more narrow and they will be
talking to each other, or the /dev/kvm permissions will be a nice funnel
to get them talking to each other.

The SGX solution, btw, was to at least ensure forward progress (CPUSVN
update) when the last enclave goes away. So new enclaves aren't
*prevented* from starting but the window when the first one starts
(enclave count going from 0->1) is leveraged to do the update.