[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c270d467-4a12-4dab-8cdc-47b61779a37c@suse.com>
Date: Wed, 27 Nov 2024 20:08:26 +0200
From: Nikolay Borisov <nik.borisov@...e.com>
To: Yan Zhao <yan.y.zhao@...el.com>, pbonzini@...hat.com, seanjc@...gle.com,
kvm@...r.kernel.org, dave.hansen@...ux.intel.com
Cc: rick.p.edgecombe@...el.com, kai.huang@...el.com, adrian.hunter@...el.com,
reinette.chatre@...el.com, xiaoyao.li@...el.com, tony.lindgren@...el.com,
binbin.wu@...ux.intel.com, dmatlack@...gle.com, isaku.yamahata@...el.com,
isaku.yamahata@...il.com, linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH v2 21/24] KVM: TDX: Add an ioctl to create initial guest
memory
On 12.11.24 г. 9:38 ч., Yan Zhao wrote:
> From: Isaku Yamahata <isaku.yamahata@...el.com>
>
> Add a new ioctl for the user space VMM to initialize guest memory with the
> specified memory contents.
>
> Because TDX protects the guest's memory, the creation of the initial guest
> memory requires a dedicated TDX module API, TDH.MEM.PAGE.ADD(), instead of
> directly copying the memory contents into the guest's memory in the case of
> the default VM type.
>
> Define a new subcommand, KVM_TDX_INIT_MEM_REGION, of vCPU-scoped
> KVM_MEMORY_ENCRYPT_OP. Check if the GFN is already pre-allocated, assign
> the guest page in Secure-EPT, copy the initial memory contents into the
> guest memory, and encrypt the guest memory. Optionally, extend the memory
> measurement of the TDX guest.
>
> The ioctl uses the vCPU file descriptor because of the TDX module's
> requirement that the memory is added to the S-EPT (via TDH.MEM.SEPT.ADD)
> prior to initialization (TDH.MEM.PAGE.ADD). Accessing the MMU in turn
> requires a vCPU file descriptor, just like for KVM_PRE_FAULT_MEMORY. In
> fact, the post-populate callback is able to reuse the same logic used by
> KVM_PRE_FAULT_MEMORY, so that userspace can do everything with a single
> ioctl.
>
> Note that this is the only way to invoke TDH.MEM.SEPT.ADD before the TD
> in finalized, as userspace cannot use KVM_PRE_FAULT_MEMORY at that
> point. This ensures that there cannot be pages in the S-EPT awaiting
> TDH.MEM.PAGE.ADD, which would be treated incorrectly as spurious by
> tdp_mmu_map_handle_target_level() (KVM would see the SPTE as PRESENT,
> but the corresponding S-EPT entry will be !PRESENT).
>
> Suggested-by: Sean Christopherson <seanjc@...gle.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> Co-developed-by: Yan Zhao <yan.y.zhao@...el.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> ---
> TDX MMU part 2 v2:
> - Updated commit msg (Paolo)
> - Added a guard around kvm_tdp_mmu_gpa_is_mapped() (Paolo).
> - Remove checking kvm_mem_is_private() in tdx_gmem_post_populate (Rick)
> - No need for is_td_finalized() (Rick)
> - Remove decrement of nr_premapped (moved to "Finalize VM initialization"
> patch) (Paolo)
> - Take slots_lock before checking kvm_tdx->finalized in
> tdx_vcpu_init_mem_region(), and use guard() (Paolo)
> - Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL
> wrappers (Kai)
> - Add TD state handling (Tony)
>
> TDX MMU part 2 v1:
> - Update the code according to latest gmem update.
> https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/
> - Fixup a aligment bug reported by Binbin.
> - Rename KVM_MEMORY_MAPPING => KVM_MAP_MEMORY (Sean)
> - Drop issueing TDH.MEM.PAGE.ADD() on KVM_MAP_MEMORY(), defer it to
> KVM_TDX_INIT_MEM_REGION. (Sean)
> - Added nr_premapped to track the number of premapped pages
> - Drop tdx_post_mmu_map_page().
> - Drop kvm_slot_can_be_private() check (Paolo)
> - Use kvm_tdp_mmu_gpa_is_mapped() (Paolo)
>
> v19:
> - Switched to use KVM_MEMORY_MAPPING
> - Dropped measurement extension
> - updated commit message. private_page_add() => set_private_spte()
> ---
> arch/x86/include/uapi/asm/kvm.h | 9 ++
> arch/x86/kvm/vmx/tdx.c | 147 ++++++++++++++++++++++++++++++++
> virt/kvm/kvm_main.c | 1 +
> 3 files changed, 157 insertions(+)
>
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 36fa03376581..a19cd84cec76 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
> KVM_TDX_CAPABILITIES = 0,
> KVM_TDX_INIT_VM,
> KVM_TDX_INIT_VCPU,
> + KVM_TDX_INIT_MEM_REGION,
> KVM_TDX_GET_CPUID,
>
> KVM_TDX_CMD_NR_MAX,
> @@ -985,4 +986,12 @@ struct kvm_tdx_init_vm {
> struct kvm_cpuid2 cpuid;
> };
>
> +#define KVM_TDX_MEASURE_MEMORY_REGION _BITULL(0)
> +
> +struct kvm_tdx_init_mem_region {
> + __u64 source_addr;
> + __u64 gpa;
> + __u64 nr_pages;
> +};
> +
> #endif /* _ASM_X86_KVM_H */
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index ead520083397..15cedacd717a 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1,4 +1,5 @@
> // SPDX-License-Identifier: GPL-2.0
> +#include <linux/cleanup.h>
> #include <linux/cpu.h>
> #include <asm/tdx.h>
> #include "capabilities.h"
> @@ -7,6 +8,7 @@
> #include "tdx.h"
> #include "vmx.h"
> #include "mmu/spte.h"
> +#include "common.h"
>
> #undef pr_fmt
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> @@ -1597,6 +1599,148 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> return 0;
> }
>
> +struct tdx_gmem_post_populate_arg {
> + struct kvm_vcpu *vcpu;
> + __u32 flags;
> +};
> +
> +static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> + void __user *src, int order, void *_arg)
> +{
> + u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS;
> + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> + struct tdx_gmem_post_populate_arg *arg = _arg;
> + struct kvm_vcpu *vcpu = arg->vcpu;
> + gpa_t gpa = gfn_to_gpa(gfn);
> + u8 level = PG_LEVEL_4K;
> + struct page *page;
> + int ret, i;
> + u64 err, entry, level_state;
> +
> + /*
> + * Get the source page if it has been faulted in. Return failure if the
> + * source page has been swapped out or unmapped in primary memory.
> + */
> + ret = get_user_pages_fast((unsigned long)src, 1, 0, &page);
> + if (ret < 0)
> + return ret;
> + if (ret != 1)
> + return -ENOMEM;
> +
> + ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level);
> + if (ret < 0)
> + goto out;
> +
> + /*
> + * The private mem cannot be zapped after kvm_tdp_map_page()
> + * because all paths are covered by slots_lock and the
> + * filemap invalidate lock. Check that they are indeed enough.
> + */
> + if (IS_ENABLED(CONFIG_KVM_PROVE_MMU)) {
> + scoped_guard(read_lock, &kvm->mmu_lock) {
> + if (KVM_BUG_ON(!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa), kvm)) {
> + ret = -EIO;
> + goto out;
> + }
> + }
> + }
> +
> + ret = 0;
> + do {
> + err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn),
> + pfn_to_hpa(page_to_pfn(page)),
> + &entry, &level_state);
> + } while (err == TDX_ERROR_SEPT_BUSY);
> + if (err) {
> + ret = -EIO;
> + goto out;
> + }
> +
> + if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) {
> + for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) {
> + err = tdh_mr_extend(kvm_tdx->tdr_pa, gpa + i, &entry,
> + &level_state);
> + if (err) {
> + ret = -EIO;
> + break;
> + }
> + }
> + }
> +
> +out:
> + put_page(page);
> + return ret;
> +}
> +
> +static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> + struct vcpu_tdx *tdx = to_tdx(vcpu);
> + struct kvm *kvm = vcpu->kvm;
> + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> + struct kvm_tdx_init_mem_region region;
> + struct tdx_gmem_post_populate_arg arg;
> + long gmem_ret;
> + int ret;
> +
> + if (tdx->state != VCPU_TD_STATE_INITIALIZED)
> + return -EINVAL;
> +
> + guard(mutex)(&kvm->slots_lock);
It seems the scope of this lock can be reduced. It's really needed for
the kvm_gmem_populate call only, no ?
<snip>
Powered by blists - more mailing lists