lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c270d467-4a12-4dab-8cdc-47b61779a37c@suse.com>
Date: Wed, 27 Nov 2024 20:08:26 +0200
From: Nikolay Borisov <nik.borisov@...e.com>
To: Yan Zhao <yan.y.zhao@...el.com>, pbonzini@...hat.com, seanjc@...gle.com,
 kvm@...r.kernel.org, dave.hansen@...ux.intel.com
Cc: rick.p.edgecombe@...el.com, kai.huang@...el.com, adrian.hunter@...el.com,
 reinette.chatre@...el.com, xiaoyao.li@...el.com, tony.lindgren@...el.com,
 binbin.wu@...ux.intel.com, dmatlack@...gle.com, isaku.yamahata@...el.com,
 isaku.yamahata@...il.com, linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH v2 21/24] KVM: TDX: Add an ioctl to create initial guest
 memory



On 12.11.24 г. 9:38 ч., Yan Zhao wrote:
> From: Isaku Yamahata <isaku.yamahata@...el.com>
> 
> Add a new ioctl for the user space VMM to initialize guest memory with the
> specified memory contents.
> 
> Because TDX protects the guest's memory, the creation of the initial guest
> memory requires a dedicated TDX module API, TDH.MEM.PAGE.ADD(), instead of
> directly copying the memory contents into the guest's memory in the case of
> the default VM type.
> 
> Define a new subcommand, KVM_TDX_INIT_MEM_REGION, of vCPU-scoped
> KVM_MEMORY_ENCRYPT_OP.  Check if the GFN is already pre-allocated, assign
> the guest page in Secure-EPT, copy the initial memory contents into the
> guest memory, and encrypt the guest memory.  Optionally, extend the memory
> measurement of the TDX guest.
> 
> The ioctl uses the vCPU file descriptor because of the TDX module's
> requirement that the memory is added to the S-EPT (via TDH.MEM.SEPT.ADD)
> prior to initialization (TDH.MEM.PAGE.ADD).  Accessing the MMU in turn
> requires a vCPU file descriptor, just like for KVM_PRE_FAULT_MEMORY.  In
> fact, the post-populate callback is able to reuse the same logic used by
> KVM_PRE_FAULT_MEMORY, so that userspace can do everything with a single
> ioctl.
> 
> Note that this is the only way to invoke TDH.MEM.SEPT.ADD before the TD
> in finalized, as userspace cannot use KVM_PRE_FAULT_MEMORY at that
> point.  This ensures that there cannot be pages in the S-EPT awaiting
> TDH.MEM.PAGE.ADD, which would be treated incorrectly as spurious by
> tdp_mmu_map_handle_target_level() (KVM would see the SPTE as PRESENT,
> but the corresponding S-EPT entry will be !PRESENT).
> 
> Suggested-by: Sean Christopherson <seanjc@...gle.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> Co-developed-by: Yan Zhao <yan.y.zhao@...el.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> ---
> TDX MMU part 2 v2:
>   - Updated commit msg (Paolo)
>   - Added a guard around kvm_tdp_mmu_gpa_is_mapped() (Paolo).
>   - Remove checking kvm_mem_is_private() in tdx_gmem_post_populate (Rick)
>   - No need for is_td_finalized() (Rick)
>   - Remove decrement of nr_premapped (moved to "Finalize VM initialization"
>     patch) (Paolo)
>   - Take slots_lock before checking kvm_tdx->finalized in
>     tdx_vcpu_init_mem_region(), and use guard() (Paolo)
>   - Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL
>     wrappers (Kai)
>   - Add TD state handling (Tony)
> 
> TDX MMU part 2 v1:
>   - Update the code according to latest gmem update.
>     https://lore.kernel.org/kvm/CABgObfa=a3cKcKJHQRrCs-3Ty8ppSRou=dhi6Q+KdZnom0Zegw@mail.gmail.com/
>   - Fixup a aligment bug reported by Binbin.
>   - Rename KVM_MEMORY_MAPPING => KVM_MAP_MEMORY (Sean)
>   - Drop issueing TDH.MEM.PAGE.ADD() on KVM_MAP_MEMORY(), defer it to
>     KVM_TDX_INIT_MEM_REGION. (Sean)
>   - Added nr_premapped to track the number of premapped pages
>   - Drop tdx_post_mmu_map_page().
>   - Drop kvm_slot_can_be_private() check (Paolo)
>   - Use kvm_tdp_mmu_gpa_is_mapped() (Paolo)
> 
> v19:
>   - Switched to use KVM_MEMORY_MAPPING
>   - Dropped measurement extension
>   - updated commit message. private_page_add() => set_private_spte()
> ---
>   arch/x86/include/uapi/asm/kvm.h |   9 ++
>   arch/x86/kvm/vmx/tdx.c          | 147 ++++++++++++++++++++++++++++++++
>   virt/kvm/kvm_main.c             |   1 +
>   3 files changed, 157 insertions(+)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 36fa03376581..a19cd84cec76 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -931,6 +931,7 @@ enum kvm_tdx_cmd_id {
>   	KVM_TDX_CAPABILITIES = 0,
>   	KVM_TDX_INIT_VM,
>   	KVM_TDX_INIT_VCPU,
> +	KVM_TDX_INIT_MEM_REGION,
>   	KVM_TDX_GET_CPUID,
>   
>   	KVM_TDX_CMD_NR_MAX,
> @@ -985,4 +986,12 @@ struct kvm_tdx_init_vm {
>   	struct kvm_cpuid2 cpuid;
>   };
>   
> +#define KVM_TDX_MEASURE_MEMORY_REGION   _BITULL(0)
> +
> +struct kvm_tdx_init_mem_region {
> +	__u64 source_addr;
> +	__u64 gpa;
> +	__u64 nr_pages;
> +};
> +
>   #endif /* _ASM_X86_KVM_H */
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index ead520083397..15cedacd717a 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1,4 +1,5 @@
>   // SPDX-License-Identifier: GPL-2.0
> +#include <linux/cleanup.h>
>   #include <linux/cpu.h>
>   #include <asm/tdx.h>
>   #include "capabilities.h"
> @@ -7,6 +8,7 @@
>   #include "tdx.h"
>   #include "vmx.h"
>   #include "mmu/spte.h"
> +#include "common.h"
>   
>   #undef pr_fmt
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> @@ -1597,6 +1599,148 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
>   	return 0;
>   }
>   
> +struct tdx_gmem_post_populate_arg {
> +	struct kvm_vcpu *vcpu;
> +	__u32 flags;
> +};
> +
> +static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> +				  void __user *src, int order, void *_arg)
> +{
> +	u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +	struct tdx_gmem_post_populate_arg *arg = _arg;
> +	struct kvm_vcpu *vcpu = arg->vcpu;
> +	gpa_t gpa = gfn_to_gpa(gfn);
> +	u8 level = PG_LEVEL_4K;
> +	struct page *page;
> +	int ret, i;
> +	u64 err, entry, level_state;
> +
> +	/*
> +	 * Get the source page if it has been faulted in. Return failure if the
> +	 * source page has been swapped out or unmapped in primary memory.
> +	 */
> +	ret = get_user_pages_fast((unsigned long)src, 1, 0, &page);
> +	if (ret < 0)
> +		return ret;
> +	if (ret != 1)
> +		return -ENOMEM;
> +
> +	ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level);
> +	if (ret < 0)
> +		goto out;
> +
> +	/*
> +	 * The private mem cannot be zapped after kvm_tdp_map_page()
> +	 * because all paths are covered by slots_lock and the
> +	 * filemap invalidate lock.  Check that they are indeed enough.
> +	 */
> +	if (IS_ENABLED(CONFIG_KVM_PROVE_MMU)) {
> +		scoped_guard(read_lock, &kvm->mmu_lock) {
> +			if (KVM_BUG_ON(!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa), kvm)) {
> +				ret = -EIO;
> +				goto out;
> +			}
> +		}
> +	}
> +
> +	ret = 0;
> +	do {
> +		err = tdh_mem_page_add(kvm_tdx->tdr_pa, gpa, pfn_to_hpa(pfn),
> +				       pfn_to_hpa(page_to_pfn(page)),
> +				       &entry, &level_state);
> +	} while (err == TDX_ERROR_SEPT_BUSY);
> +	if (err) {
> +		ret = -EIO;
> +		goto out;
> +	}
> +
> +	if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) {
> +		for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) {
> +			err = tdh_mr_extend(kvm_tdx->tdr_pa, gpa + i, &entry,
> +					    &level_state);
> +			if (err) {
> +				ret = -EIO;
> +				break;
> +			}
> +		}
> +	}
> +
> +out:
> +	put_page(page);
> +	return ret;
> +}
> +
> +static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> +{
> +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +	struct kvm_tdx_init_mem_region region;
> +	struct tdx_gmem_post_populate_arg arg;
> +	long gmem_ret;
> +	int ret;
> +
> +	if (tdx->state != VCPU_TD_STATE_INITIALIZED)
> +		return -EINVAL;
> +
> +	guard(mutex)(&kvm->slots_lock);

It seems the scope of this lock can be reduced. It's really needed for 
the kvm_gmem_populate call only, no ?

<snip>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ