linux-kernel - Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250709232103.zwmufocd3l7sqk7y@amd.com>
Date: Wed, 9 Jul 2025 18:21:03 -0500
From: Michael Roth <michael.roth@....com>
To: Yan Zhao <yan.y.zhao@...el.com>
CC: <pbonzini@...hat.com>, <seanjc@...gle.com>, <kvm@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <rick.p.edgecombe@...el.com>,
	<kai.huang@...el.com>, <adrian.hunter@...el.com>,
	<reinette.chatre@...el.com>, <xiaoyao.li@...el.com>,
	<tony.lindgren@...el.com>, <binbin.wu@...ux.intel.com>,
	<dmatlack@...gle.com>, <isaku.yamahata@...el.com>, <ira.weiny@...el.com>,
	<vannapurve@...gle.com>, <david@...hat.com>, <ackerleytng@...gle.com>,
	<tabba@...gle.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from
 kvm_gmem_populate()

On Thu, Jul 03, 2025 at 02:26:41PM +0800, Yan Zhao wrote:
> Rather than invoking kvm_gmem_populate(), allow tdx_vcpu_init_mem_region()
> to use open code to populate the initial memory region into the mirror page
> table, and add the region to S-EPT.
> 
> Background
> ===
> Sean initially suggested TDX to populate initial memory region in a 4-step
> way [1]. Paolo refactored guest_memfd and introduced kvm_gmem_populate()
> interface [2] to help TDX populate init memory region.
> 
> tdx_vcpu_init_mem_region
>     guard(mutex)(&kvm->slots_lock)
>     kvm_gmem_populate
>         filemap_invalidate_lock(file->f_mapping)
>             __kvm_gmem_get_pfn      //1. get private PFN
>             post_populate           //tdx_gmem_post_populate
>                 get_user_pages_fast //2. get source page
>                 kvm_tdp_map_page    //3. map private PFN to mirror root
>                 tdh_mem_page_add    //4. add private PFN to S-EPT and copy
>                                          source page to it.
> 
> kvm_gmem_populate() helps TDX to "get private PFN" in step 1. Its file
> invalidate lock also helps ensure the private PFN remains valid when
> tdh_mem_page_add() is invoked in TDX's post_populate hook.
> 
> Though TDX does not need the folio prepration code, kvm_gmem_populate()
> helps on sharing common code between SEV-SNP and TDX.
> 
> Problem
> ===
> (1)
> In Michael's series "KVM: gmem: 2MB THP support and preparedness tracking
> changes" [4], kvm_gmem_get_pfn() was modified to rely on the filemap
> invalidation lock for protecting its preparedness tracking. Similarly, the
> in-place conversion version of guest_memfd series by Ackerly also requires
> kvm_gmem_get_pfn() to acquire filemap invalidation lock [5].
> 
> kvm_gmem_get_pfn
>     filemap_invalidate_lock_shared(file_inode(file)->i_mapping);
> 
> However, since kvm_gmem_get_pfn() is called by kvm_tdp_map_page(), which is
> in turn invoked within kvm_gmem_populate() in TDX, a deadlock occurs on the
> filemap invalidation lock.

Bringing the prior discussion over to here: it seems wrong that
kvm_gmem_get_pfn() is getting called within the kvm_gmem_populate()
chain, because:

1) kvm_gmem_populate() is specifically passing the gmem PFN down to
   tdx_gmem_post_populate(), but we are throwing it away to grab it
   again kvm_gmem_get_pfn(), which is then creating these locking issues
   that we are trying to work around. If we could simply pass that PFN down
   to kvm_tdp_map_page() (or some variant), then we would not trigger any
   deadlocks in the first place.

2) kvm_gmem_populate() is intended for pre-boot population of guest
   memory, and allows the post_populate callback to handle setting
   up the architecture-specific preparation, whereas kvm_gmem_get_pfn()
   calls kvm_arch_gmem_prepare(), which is intended to handle post-boot
   setup of private memory. Having kvm_gmem_get_pfn() called as part of
   kvm_gmem_populate() chain brings things 2 things in conflict with
   each other, and TDX seems to be relying on that fact that it doesn't
   implement a handler for kvm_arch_gmem_prepare(). 

I don't think this hurts anything in the current code, and I don't
personally see any issue with open-coding the population path if it doesn't
fit TDX very well, but there was some effort put into making
kvm_gmem_populate() usable for both TDX/SNP, and if the real issue isn't the
design of the interface itself, but instead just some inflexibility on the
KVM MMU mapping side, then it seems more robust to address the latter if
possible.

Would something like the below be reasonable? I scoped it to only be for
mapping gmem pages, but I guess it could be generalized if similar use-case
ever arose. (completely untested)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b4b6860ab971..93319f02f8b7 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -259,6 +259,8 @@ extern bool tdp_mmu_enabled;
 
 bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa);
 int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level);
+int kvm_tdp_map_page_gmem(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level,
+                         kvm_pfn_t gmem_pfn);
 
 static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e06e2e89a8f..61766ac0fa29 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4503,11 +4503,17 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
                return -EFAULT;
        }
 
-       r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
-                            &fault->refcounted_page, &max_order);
-       if (r) {
-               kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
-               return r;
+       if (fault->gmem_pfn == KVM_PFN_ERR_FAULT) {
+               r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn,
+                                    &fault->refcounted_page, &max_order);
+               if (r) {
+                       kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
+                       return r;
+               }
+       } else {
+               /* Only the above-handled KVM_PFN_ERR_FAULT is expected currently. */
+               WARN_ON_ONCE(is_error_pfn(fault->gmem_pfn);
+               fault->pfn = fault->gmem_pfn;
        }
 
        fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
@@ -4847,7 +4853,8 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
        return direct_page_fault(vcpu, fault);
 }
 
-int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level)
+int kvm_tdp_map_page_gmem(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level,
+                         kvm_pfn_t gmem_pfn)
 {
        int r;
 
@@ -4866,7 +4873,12 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level
                        return -EIO;
 
                cond_resched();
-               r = kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level);
+
+               if (is_error_pfn(pfn))
+                       r = kvm_mmu_do_page_fault(vcpu, gpa, error_code, true, NULL, level);
+               else
+                       folio_get(
+                       r = kvm_mmu_do_page_fault_pfn(vcpu, gpa, error_code, true, NULL, level, pfn);
        } while (r == RET_PF_RETRY);
 
        if (r < 0)
@@ -4889,6 +4901,12 @@ int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level
                return -EIO;
        }
 }
+EXPORT_SYMBOL_GPL(kvm_tdp_map_page_gmem);
+
+int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level)
+{
+       return kvm_tdp_map_page_gmem(vcpu, gpa, error_code, level, KVM_PFN_ERR_FAULT);
+}
 EXPORT_SYMBOL_GPL(kvm_tdp_map_page);
 
 long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index db8f33e4de62..b0ffa4541657 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -286,6 +286,7 @@ struct kvm_page_fault {
        /* Outputs of kvm_mmu_faultin_pfn().  */
        unsigned long mmu_seq;
        kvm_pfn_t pfn;
+       kvm_pfn_t gmem_pfn;
        struct page *refcounted_page;
        bool map_writable;
 
@@ -344,9 +345,10 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
                                      fault->is_private);
 }
 
-static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-                                       u64 err, bool prefetch,
-                                       int *emulation_type, u8 *level)
+static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+                                         u64 err, bool prefetch,
+                                         int *emulation_type, u8 *level,
+                                         kvm_pfn_t gmem_pfn)
 {
        struct kvm_page_fault fault = {
                .addr = cr2_or_gpa,
@@ -367,6 +369,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
                .is_private = err & PFERR_PRIVATE_ACCESS,
 
                .pfn = KVM_PFN_ERR_FAULT,
+               .gmem_pfn = gmem_pfn,
        };
        int r;
 
@@ -408,6 +411,14 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
        return r;
 }
 
+static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+                                       u64 err, bool prefetch,
+                                       int *emulation_type, u8 *level)
+{
+       return __kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, err, prefetch,
+                                      emulation_type, level, KVM_PFN_ERR_FAULT);
+}
+
 int kvm_mmu_max_mapping_level(struct kvm *kvm,
                              const struct kvm_memory_slot *slot, gfn_t gfn);
 void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault);
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b952bc673271..516416a9af27 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3075,7 +3075,7 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
        if (ret != 1)
                return -ENOMEM;
 
-       ret = kvm_tdp_map_page(vcpu, gpa, error_code, &level);
+       ret = kvm_tdp_map_page_gmem(vcpu, gpa, error_code, &level, pfn);
        if (ret < 0)
                goto out;

Thanks,

Mike

> 
> (2)
> Moreover, in step 2, get_user_pages_fast() may acquire mm->mmap_lock,
> resulting in the following lock sequence in tdx_vcpu_init_mem_region():
> - filemap invalidation lock --> mm->mmap_lock
> 
> However, in future code, the shared filemap invalidation lock will be held
> in kvm_gmem_fault_shared() (see [6]), leading to the lock sequence:
> - mm->mmap_lock --> filemap invalidation lock
> 
> This creates an AB-BA deadlock issue.
> 
> These two issues should still present in Michael Roth's code [7], [8].
> 
> Proposal
> ===
> To prevent deadlock and the AB-BA issue, this patch enables TDX to populate
> the initial memory region independently of kvm_gmem_populate(). The revised
> sequence in tdx_vcpu_init_mem_region() is as follows:
> 
> tdx_vcpu_init_mem_region
>     guard(mutex)(&kvm->slots_lock)
>     tdx_init_mem_populate
>         get_user_pages_fast //1. get source page
>         kvm_tdp_map_page    //2. map private PFN to mirror root
>         read_lock(&kvm->mmu_lock);
>         kvm_tdp_mmu_gpa_is_mapped // 3. check if the gpa is mapped in the
>                                         mirror root and return the mapped
>                                         private PFN.
>         tdh_mem_page_add    //4. add private PFN to S-EPT and copy source
>                                  page to it
>         read_unlock(&kvm->mmu_lock);
> 
> The original step 1 "get private PFN" is now integrated in the new step 3
> "check if the gpa is mapped in the mirror root and return the mapped
> private PFN".
> 
> With the protection of slots_lock, the read mmu_lock ensures the private
> PFN added by tdh_mem_page_add() is the same one mapped in the mirror page
> table. Addiontionally, before the TD state becomes TD_STATE_RUNNABLE, the
> only permitted map level is 4KB, preventing any potential merging or
> splitting in the mirror root under the read mmu_lock.
> 
> So, this approach should work for TDX. It still follows the spirit in
> Sean's suggestion [1], where mapping the private PFN to mirror root and
> adding it to the S-EPT with initial content from the source page are
> executed in separate steps.
> 
> Discussions
> ===
> The introduction of kvm_gmem_populate() was intended to make it usable by
> both TDX and SEV-SNP [3], which is why Paolo provided the vendor hook
> post_populate for both.
> 
> a) TDX keeps using kvm_gmem_populate().
>    Pros: - keep the status quo
>          - share common code between SEV-SNP and TDX, though TDX does not
>            need to prepare folios.
>    Cons: - we need to explore solutions to the locking issues, e.g. the
>            proposal at [11].
>          - PFN is faulted in twice for each GFN:
>            one in __kvm_gmem_get_pfn(), another in kvm_gmem_get_pfn().
> 
> b) Michael suggested introducing some variant of
>    kvm_tdp_map_page()/kvm_mmu_do_page_fault() to avoid invoking
>    kvm_gmem_get_pfn() in the kvm_gmem_populate() path. [10].
>    Pro:  - TDX can still invoke kvm_gmem_populate().
>            can share common code between SEV-SNP and TDX.
>    Cons: - only TDX needs this variant.
>          - can't fix the 2nd AB-BA lock issue.
> 
> c) Change in this patch
>    Pro: greater flexibility. Simplify the implementation for both SEV-SNP
>         and TDX.
>    Con: undermine the purpose of sharing common code.
>         kvm_gmem_populate() may only be usable by SEV-SNP in future.
> 
> Link: https://lore.kernel.org/kvm/Ze-TJh0BBOWm9spT@google.com [1]
> Link: https://lore.kernel.org/lkml/20240404185034.3184582-10-pbonzini@redhat.com [2]
> Link: https://lore.kernel.org/lkml/20240404185034.3184582-1-pbonzini@redhat.com [3]
> Link: https://lore.kernel.org/lkml/20241212063635.712877-4-michael.roth@amd.com [4]
> Link: https://lore.kernel.org/all/b784326e9ccae6a08388f1bf39db70a2204bdc51.1747264138.git.ackerleytng@google.com [5]
> Link: https://lore.kernel.org/all/20250430165655.605595-9-tabba@google.com [6]
> Link: https://github.com/mdroth/linux/commits/mmap-swprot-v10-snp0-wip2 [7]
> Link: https://lore.kernel.org/kvm/20250613005400.3694904-1-michael.roth@amd.com [8]
> Link: https://lore.kernel.org/lkml/20250613151939.z5ztzrtibr6xatql@amd.com [9]
> Link: https://lore.kernel.org/lkml/20250613180418.bo4vqveigxsq2ouu@amd.com [10]
> Link: https://lore.kernel.org/lkml/aErK25Oo5VJna40z@yzhao56-desk.sh.intel.com [11]
> 
> Suggested-by: Vishal Annapurve <vannapurve@...gle.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> ---
> This is an RFC patch. Will split it later.
> ---
>  arch/x86/kvm/mmu.h         |  3 +-
>  arch/x86/kvm/mmu/tdp_mmu.c |  6 ++-
>  arch/x86/kvm/vmx/tdx.c     | 96 ++++++++++++++------------------------
>  3 files changed, 42 insertions(+), 63 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index b4b6860ab971..b122255c7d4e 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -257,7 +257,8 @@ extern bool tdp_mmu_enabled;
>  #define tdp_mmu_enabled false
>  #endif
>  
> -bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa);
> +bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, gpa_t gpa, int level,
> +			       kvm_pfn_t *pfn);
>  int kvm_tdp_map_page(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code, u8 *level);
>  
>  static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7f3d7229b2c1..bb95c95f6531 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1934,7 +1934,8 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
>  	return __kvm_tdp_mmu_get_walk(vcpu, addr, sptes, root);
>  }
>  
> -bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa)
> +bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, gpa_t gpa, int level,
> +			       kvm_pfn_t *pfn)
>  {
>  	struct kvm *kvm = vcpu->kvm;
>  	bool is_direct = kvm_is_addr_direct(kvm, gpa);
> @@ -1947,10 +1948,11 @@ bool kvm_tdp_mmu_gpa_is_mapped(struct kvm_vcpu *vcpu, u64 gpa)
>  	rcu_read_lock();
>  	leaf = __kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, root_to_sp(root));
>  	rcu_read_unlock();
> -	if (leaf < 0)
> +	if (leaf < 0 || leaf != level)
>  		return false;
>  
>  	spte = sptes[leaf];
> +	*pfn = spte_to_pfn(spte);
>  	return is_shadow_present_pte(spte) && is_last_spte(spte, leaf);
>  }
>  EXPORT_SYMBOL_GPL(kvm_tdp_mmu_gpa_is_mapped);
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b952bc673271..f3c2bb3554c3 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1521,9 +1521,9 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn,
>  }
>  
>  /*
> - * KVM_TDX_INIT_MEM_REGION calls kvm_gmem_populate() to map guest pages; the
> - * callback tdx_gmem_post_populate() then maps pages into private memory.
> - * through the a seamcall TDH.MEM.PAGE.ADD().  The SEAMCALL also requires the
> + * KVM_TDX_INIT_MEM_REGION calls tdx_init_mem_populate() to first map guest
> + * pages into mirror page table and then maps pages into private memory through
> + * the a SEAMCALL TDH.MEM.PAGE.ADD().  The SEAMCALL also requires the
>   * private EPT structures for the page to have been built before, which is
>   * done via kvm_tdp_map_page(). nr_premapped counts the number of pages that
>   * were added to the EPT structures but not added with TDH.MEM.PAGE.ADD().
> @@ -3047,23 +3047,17 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  	WARN_ON_ONCE(init_event);
>  }
>  
> -struct tdx_gmem_post_populate_arg {
> -	struct kvm_vcpu *vcpu;
> -	__u32 flags;
> -};
> -
> -static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> -				  void __user *src, int order, void *_arg)
> +static int tdx_init_mem_populate(struct kvm_vcpu *vcpu, gpa_t gpa,
> +				 void __user *src, __u32 flags)
>  {
>  	u64 error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS;
> -	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> -	struct tdx_gmem_post_populate_arg *arg = _arg;
> -	struct kvm_vcpu *vcpu = arg->vcpu;
> -	gpa_t gpa = gfn_to_gpa(gfn);
> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> +	struct kvm *kvm = vcpu->kvm;
>  	u8 level = PG_LEVEL_4K;
>  	struct page *src_page;
>  	int ret, i;
>  	u64 err, entry, level_state;
> +	kvm_pfn_t pfn;
>  
>  	/*
>  	 * Get the source page if it has been faulted in. Return failure if the
> @@ -3079,38 +3073,33 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
>  	if (ret < 0)
>  		goto out;
>  
> -	/*
> -	 * The private mem cannot be zapped after kvm_tdp_map_page()
> -	 * because all paths are covered by slots_lock and the
> -	 * filemap invalidate lock.  Check that they are indeed enough.
> -	 */
> -	if (IS_ENABLED(CONFIG_KVM_PROVE_MMU)) {
> -		scoped_guard(read_lock, &kvm->mmu_lock) {
> -			if (KVM_BUG_ON(!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa), kvm)) {
> -				ret = -EIO;
> -				goto out;
> -			}
> -		}
> -	}
> +	KVM_BUG_ON(level != PG_LEVEL_4K, kvm);
>  
> -	ret = 0;
> -	err = tdh_mem_page_add(&kvm_tdx->td, gpa, pfn_to_page(pfn),
> -			       src_page, &entry, &level_state);
> -	if (err) {
> -		ret = unlikely(tdx_operand_busy(err)) ? -EBUSY : -EIO;
> -		goto out;
> -	}
> +	scoped_guard(read_lock, &kvm->mmu_lock) {
> +		if (!kvm_tdp_mmu_gpa_is_mapped(vcpu, gpa, level, &pfn)) {
> +			ret = -EIO;
> +			goto out;
> +		}
>  
> -	if (!KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm))
> -		atomic64_dec(&kvm_tdx->nr_premapped);
> +		ret = 0;
> +		err = tdh_mem_page_add(&kvm_tdx->td, gpa, pfn_to_page(pfn),
> +				       src_page, &entry, &level_state);
> +		if (err) {
> +			ret = unlikely(tdx_operand_busy(err)) ? -EBUSY : -EIO;
> +			goto out;
> +		}
>  
> -	if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) {
> -		for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) {
> -			err = tdh_mr_extend(&kvm_tdx->td, gpa + i, &entry,
> -					    &level_state);
> -			if (err) {
> -				ret = -EIO;
> -				break;
> +		if (!KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm))
> +			atomic64_dec(&kvm_tdx->nr_premapped);
> +
> +		if (flags & KVM_TDX_MEASURE_MEMORY_REGION) {
> +			for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) {
> +				err = tdh_mr_extend(&kvm_tdx->td, gpa + i, &entry,
> +						    &level_state);
> +				if (err) {
> +					ret = -EIO;
> +					break;
> +				}
>  			}
>  		}
>  	}
> @@ -3126,8 +3115,6 @@ static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *c
>  	struct kvm *kvm = vcpu->kvm;
>  	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
>  	struct kvm_tdx_init_mem_region region;
> -	struct tdx_gmem_post_populate_arg arg;
> -	long gmem_ret;
>  	int ret;
>  
>  	if (tdx->state != VCPU_TD_STATE_INITIALIZED)
> @@ -3160,22 +3147,11 @@ static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *c
>  			break;
>  		}
>  
> -		arg = (struct tdx_gmem_post_populate_arg) {
> -			.vcpu = vcpu,
> -			.flags = cmd->flags,
> -		};
> -		gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa),
> -					     u64_to_user_ptr(region.source_addr),
> -					     1, tdx_gmem_post_populate, &arg);
> -		if (gmem_ret < 0) {
> -			ret = gmem_ret;
> -			break;
> -		}
> -
> -		if (gmem_ret != 1) {
> -			ret = -EIO;
> +		ret = tdx_init_mem_populate(vcpu, region.gpa,
> +					    u64_to_user_ptr(region.source_addr),
> +					    cmd->flags);
> +		if (ret)
>  			break;
> -		}
>  
>  		region.source_addr += PAGE_SIZE;
>  		region.gpa += PAGE_SIZE;
> -- 
> 2.43.2
>