[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAhR5DF2PhB-usQBzWuUZAd=y8tWursMOnBOzNiGEBAnkqxutA@mail.gmail.com>
Date: Wed, 11 Jun 2025 14:56:19 -0500
From: Sagi Shahar <sagis@...gle.com>
To: Chao Gao <chao.gao@...el.com>
Cc: linux-coco@...ts.linux.dev, x86@...nel.org, kvm@...r.kernel.org, 
	seanjc@...gle.com, pbonzini@...hat.com, eddie.dong@...el.com, 
	kirill.shutemov@...el.com, dave.hansen@...el.com, dan.j.williams@...el.com, 
	kai.huang@...el.com, isaku.yamahata@...el.com, elena.reshetova@...el.com, 
	rick.p.edgecombe@...el.com, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, 
	Ingo Molnar <mingo@...hat.com>, "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>, 
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC PATCH 00/20] TD-Preserving updates
On Fri, May 23, 2025 at 4:53 AM Chao Gao <chao.gao@...el.com> wrote:
>
> Hi Reviewers,
>
> This series adds support for runtime TDX module updates that preserve
> running TDX guests (a.k.a, TD-Preserving updates). The goal is to gather
> feedback on the feature design. Please pay attention to the following items:
>
> 1. TD-Preserving updates are done in stop_machine() context. it copy-pastes
>    part of multi_cpu_stop() to guarantee step-locked progress on all CPUs.
>    But, there are a few differences between them. I am wondering whether
>    these differences have reached a point where abstracting a common
>    function might do more harm than good. See more details in patch 10.
>
> 2. P-SEAMLDR seamcalls (specificially SEAMRET from P-SEAMLDR) clear current
>    VMCS pointers, which may disrupt KVM. To prevent VMX instructions in IRQ
>    context from encountering NULL current-VMCS pointers, P-SEAMLDR
>    seamcalls are called with IRQ disabled. I'm uncertain if NMIs could
>    cause a problem, but I believe they won't. See more information in patch 3.
>
> 3. Two helpers, cpu_vmcs_load() and cpu_vmcs_store(), are added in patch 3
>    to save and restore the current VMCS. KVM has a variant of cpu_vmcs_load(),
>    i.e., vmcs_load(). Extracting KVM's version would cause a lot of code
>    churn, and I don't think that can be justified for reducing ~16 LoC
>    duplication. Please let me know if you disagree.
>
> == Background ==
>
> Intel TDX isolates Trusted Domains (TDs), or confidential guests, from the
> host. A key component of Intel TDX is the TDX module, which enforces
> security policies to protect the memory and CPU states of TDs from the
> host. However, the TDX module is software that require updates, it is not
> device firmware in the typical sense.
>
> == Problems ==
>
> Currently, the TDX module is loaded by the BIOS at boot time, and the only
> way to update it is through a reboot, which results in significant system
> downtime. Users expect the TDX module to be updatable at runtime without
> disrupting TDX guests.
>
> == Solution ==
>
> On TDX platforms, P-SEAMLDR[1] is a component within the protected SEAM
> range. It is loaded by the BIOS and provides the host with functions to
> install a TDX module at runtime.
>
> Implement a TDX Module update facility via the fw_upload mechanism. Given
> that there is variability in which module update to load based on features,
> fix levels, and potentially reloading the same version for error recovery
> scenarios, the explicit userspace chosen payload flexibility of fw_upload
> is attractive.
>
> This design allows the kernel to accept a bitstream instead of loading a
> named file from the filesystem, as the module selection and policy
> enforcement for TDX modules are quite complex (see more in patch 8). By
> doing so, much of this complexity is shifted out of the kernel. The kernel
> need to expose information, such as the TDX module version, to userspace.
> The userspace tool must understand the TDX module versioning scheme and
> update policy to select the appropriate TDX module (see "TDX Module
> Versioning" below).
>
> In the unlikely event the update fails, for example userspace picks an
> incompatible update image, or the image is otherwise corrupted, all TDs
> will experience SEAMCALL failures and be killed. The recovery of TD
> operation from that event requires a reboot.
>
> Given there is no mechanism to quiesce SEAMCALLs, the TDs themselves must
> pause execution over an update. The most straightforward way to meet the
> 'pause TDs while update executes' constraint is to run the update in
> stop_machine() context. All other evaluated solutions export more
> complexity to KVM, or exports more fragility to userspace.
>
> == How to test this series ==
>
>  # git clone https://github.com/intel/tdx-module-binaries
>  # cd tdx-module-binaries
>  # python version_select_and_load.py --update
>
>
> This series is based on Sean's kvm-x86/next branch
>
>   https://github.com/kvm-x86/linux.git next
>
>
> == Other information relevant to TD-Preserving updates ==
>
> === TDX module versioning ===
>
> Each TDX module is assigned a version number x.y.z, where x represents the
> "major" version, y the "minor" version, and z the "update" version.
>
> TD-Preserving updates are restricted to Z-stream releases.
>
> Note that Z-stream releases do not necessarily guarantee compatibility. A
> new release may not be compatible with all previous versions. To address this,
> Intel provides a separate file containing compatibility information, which
> specifies the minimum module version required for a particular update. This
> information is referenced by the tool to determine if two modules are
> compatible.
>
> === TCB Stability ===
>
> Updates change the TCB as viewed by attestation reports. In TDX there is a
> distinction between launch-time version and current version where TD-preserving
> updates cause that latter version number to change, subject to Z-stream
> constraints. The need for runtime updates and the implications of that version
> change in the attestation was previously discussed in [3].
>
> === TDX Module Distribution Model ===
>
> At a high level, Intel publishes all TDX modules on the github [2], along with
> a mapping_file.json which documents the compatibility information about each
> TDX module and a script to install the TDX module. OS vendors can package
> these modules and distribute them. Administrators install the package and
> use the script to select the appropriate TDX module and install it via the
> interfaces exposed by this series.
>
> [1]: https://cdrdv2.intel.com/v1/dl/getContent/733584
> [2]: https://github.com/intel/tdx-module-binaries
> [3]: https://lore.kernel.org/all/5d1da767-491b-4077-b472-2cc3d73246d6@amazon.com/
>
>
> Chao Gao (20):
>   x86/virt/tdx: Print SEAMCALL leaf numbers in decimal
>   x86/virt/tdx: Prepare to support P-SEAMLDR SEAMCALLs
>   x86/virt/seamldr: Introduce a wrapper for P-SEAMLDR SEAMCALLs
>   x86/virt/tdx: Introduce a "tdx" subsystem and "tsm" device
>   x86/virt/tdx: Export tdx module attributes via sysfs
>   x86/virt/seamldr: Add a helper to read P-SEAMLDR information
>   x86/virt/tdx: Expose SEAMLDR information via sysfs
>   x86/virt/seamldr: Implement FW_UPLOAD sysfs ABI for TD-Preserving
>     Updates
>   x86/virt/seamldr: Allocate and populate a module update request
>   x86/virt/seamldr: Introduce skeleton for TD-Preserving updates
>   x86/virt/seamldr: Abort updates if errors occurred midway
>   x86/virt/seamldr: Shut down the current TDX module
>   x86/virt/tdx: Reset software states after TDX module shutdown
>   x86/virt/seamldr: Install a new TDX module
>   x86/virt/seamldr: Handle TD-Preserving update failures
>   x86/virt/seamldr: Do TDX cpu init after updates
>   x86/virt/tdx: Establish contexts for the new module
>   x86/virt/tdx: Update tdx_sysinfo and check features post-update
>   x86/virt/seamldr: Verify availability of slots for TD-Preserving
>     updates
>   x86/virt/seamldr: Enable TD-Preserving Updates
>
>  Documentation/ABI/testing/sysfs-devices-tdx |  32 ++
>  MAINTAINERS                                 |   1 +
>  arch/x86/Kconfig                            |  12 +
>  arch/x86/include/asm/tdx.h                  |  20 +-
>  arch/x86/include/asm/tdx_global_metadata.h  |  12 +
>  arch/x86/virt/vmx/tdx/Makefile              |   1 +
>  arch/x86/virt/vmx/tdx/seamldr.c             | 443 ++++++++++++++++++++
>  arch/x86/virt/vmx/tdx/seamldr.h             |  16 +
>  arch/x86/virt/vmx/tdx/tdx.c                 | 248 ++++++++++-
>  arch/x86/virt/vmx/tdx/tdx.h                 |  12 +
>  arch/x86/virt/vmx/tdx/tdx_global_metadata.c |  29 ++
>  arch/x86/virt/vmx/vmx.h                     |  40 ++
>  12 files changed, 862 insertions(+), 4 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-devices-tdx
>  create mode 100644 arch/x86/virt/vmx/tdx/seamldr.c
>  create mode 100644 arch/x86/virt/vmx/tdx/seamldr.h
>  create mode 100644 arch/x86/virt/vmx/vmx.h
>
> --
> 2.47.1
>
>
Tested-by: Sagi Shahar <sagis@...gle.com>
I was able to update the module while several VMs were running on the
machine using a modified version of the tdx selftests. Measuring the
update time shows less than 10ms for update regardless of the number
of VMs running.
Powered by blists - more mailing lists
 
