lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAhR5DGNXi2GeBBZUoZOac6a7_bAquUOzBJuccbeJZ1r97f0Ag@mail.gmail.com>
Date: Tue, 9 Dec 2025 17:49:37 -0600
From: Sagi Shahar <sagis@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: pbonzini@...hat.com, seanjc@...gle.com, linux-kernel@...r.kernel.org, 
	kvm@...r.kernel.org, x86@...nel.org, rick.p.edgecombe@...el.com, 
	dave.hansen@...el.com, kas@...nel.org, tabba@...gle.com, 
	ackerleytng@...gle.com, quic_eberman@...cinc.com, michael.roth@....com, 
	david@...hat.com, vannapurve@...gle.com, vbabka@...e.cz, 
	thomas.lendacky@....com, pgonda@...gle.com, zhiquan1.li@...el.com, 
	fan.du@...el.com, jun.miao@...el.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, binbin.wu@...ux.intel.com, 
	chao.p.peng@...el.com
Subject: Re: [RFC PATCH v2 10/23] KVM: TDX: Enable huge page splitting under
 write kvm->mmu_lock

On Thu, Aug 7, 2025 at 4:44 AM Yan Zhao <yan.y.zhao@...el.com> wrote:
>
> Implement the split_external_spt hook to enable huge page splitting for
> TDX when kvm->mmu_lock is held for writing.
>
> Invoke tdh_mem_range_block(), tdh_mem_track(), kicking off vCPUs,
> tdh_mem_page_demote() in sequence. All operations are performed under
> kvm->mmu_lock held for writing, similar to those in page removal.
>
> Even with kvm->mmu_lock held for writing, tdh_mem_page_demote() may still
> contend with tdh_vp_enter() and potentially with the guest's S-EPT entry
> operations. Therefore, kick off other vCPUs and prevent tdh_vp_enter()
> from being called on them to ensure success on the second attempt. Use
> KVM_BUG_ON() for any other unexpected errors.
>
> Signed-off-by: Xiaoyao Li <xiaoyao.li@...el.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> ---
> RFC v2:
> - Split out the code to handle the error TDX_INTERRUPTED_RESTARTABLE.
> - Rebased to 6.16.0-rc6 (the way of defining TDX hook changes).
>
> RFC v1:
> - Split patch for exclusive mmu_lock only,
> - Invoke tdx_sept_zap_private_spte() and tdx_track() for splitting.
> - Handled busy error of tdh_mem_page_demote() by kicking off vCPUs.
> ---
>  arch/x86/kvm/vmx/tdx.c | 45 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 376287a2ddf4..8a60ba5b6595 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1915,6 +1915,50 @@ static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
>         return 0;
>  }
>
> +static int tdx_spte_demote_private_spte(struct kvm *kvm, gfn_t gfn,
> +                                       enum pg_level level, struct page *page)
> +{
> +       int tdx_level = pg_level_to_tdx_sept_level(level);
> +       struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> +       gpa_t gpa = gfn_to_gpa(gfn);
> +       u64 err, entry, level_state;
> +
> +       err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
> +                                 &entry, &level_state);
> +
> +       if (unlikely(tdx_operand_busy(err))) {

I was trying to test this code locally (without the DPAMT patches and
with DPAMT disabled) and saw that sometimes tdh_mem_page_demote
returns TDX_INTERRUPTED_RESTARTABLE. Looking at the TDX module code
(version 1.5.16 from [1]) I see that demote and promote are the only
seamcalls that return TDX_INTERRUPTED_RESTARTABLE so it wasn't handled
by KVM until now.

I added manual handling for it and it's working correctly. Note that
my change is on top of a rebase to the latest version:

@@ -1989,9 +1989,16 @@ static int tdx_spte_demote_private_spte(struct
kvm *kvm, gfn_t gfn,
        struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
        gpa_t gpa = gfn_to_gpa(gfn);
        u64 err, entry, level_state;
+       int i = 0;

-       err = tdh_do_no_vcpus(tdh_mem_page_demote, kvm, &kvm_tdx->td, gpa,
+       while (i < TDX_SEAMCALL_RETRIES) {
+               err = tdh_do_no_vcpus(tdh_mem_page_demote, kvm,
&kvm_tdx->td, gpa,
                              tdx_level, page, &entry, &level_state);
+               if (err != TDX_INTERRUPTED_RESTARTABLE)
+                       break;
+               i++;
+       }
+
        if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_DEMOTE, entry, level_state, kvm))
                return -EIO;

[1] https://github.com/intel/confidential-computing.tdx.tdx-module

> +               tdx_no_vcpus_enter_start(kvm);
> +               err = tdh_mem_page_demote(&kvm_tdx->td, gpa, tdx_level, page,
> +                                         &entry, &level_state);
> +               tdx_no_vcpus_enter_stop(kvm);
> +       }
> +
> +       if (KVM_BUG_ON(err, kvm)) {
> +               pr_tdx_error_2(TDH_MEM_PAGE_DEMOTE, err, entry, level_state);
> +               return -EIO;
> +       }
> +       return 0;
> +}
> +
> +static int tdx_sept_split_private_spt(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> +                                     void *private_spt)
> +{
> +       struct page *page = virt_to_page(private_spt);
> +       int ret;
> +
> +       if (KVM_BUG_ON(to_kvm_tdx(kvm)->state != TD_STATE_RUNNABLE ||
> +                      level != PG_LEVEL_2M, kvm))
> +               return -EINVAL;
> +
> +       ret = tdx_sept_zap_private_spte(kvm, gfn, level, page);
> +       if (ret <= 0)
> +               return ret;
> +
> +       tdx_track(kvm);
> +
> +       return tdx_spte_demote_private_spte(kvm, gfn, level, page);
> +}
> +
>  static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
>                                         enum pg_level level, kvm_pfn_t pfn)
>  {
> @@ -3668,5 +3712,6 @@ void __init tdx_hardware_setup(void)
>         vt_x86_ops.set_external_spte = tdx_sept_set_private_spte;
>         vt_x86_ops.free_external_spt = tdx_sept_free_private_spt;
>         vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte;
> +       vt_x86_ops.split_external_spt = tdx_sept_split_private_spt;
>         vt_x86_ops.protected_apic_has_interrupt = tdx_protected_apic_has_interrupt;
>  }
> --
> 2.43.2
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ