linux-kernel - Re: [PATCH v3 06/24] KVM: x86/mmu: Disallow page merging (huge page adjustment) for mirror root

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aXpow-fzaII6RW_q@google.com>
Date: Wed, 28 Jan 2026 11:51:31 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: pbonzini@...hat.com, linux-kernel@...r.kernel.org, kvm@...r.kernel.org, 
	x86@...nel.org, rick.p.edgecombe@...el.com, dave.hansen@...el.com, 
	kas@...nel.org, tabba@...gle.com, ackerleytng@...gle.com, 
	michael.roth@....com, david@...nel.org, vannapurve@...gle.com, 
	sagis@...gle.com, vbabka@...e.cz, thomas.lendacky@....com, 
	nik.borisov@...e.com, pgonda@...gle.com, fan.du@...el.com, jun.miao@...el.com, 
	francescolavra.fl@...il.com, jgross@...e.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, xiaoyao.li@...el.com, kai.huang@...el.com, 
	binbin.wu@...ux.intel.com, chao.p.peng@...el.com, chao.gao@...el.com
Subject: Re: [PATCH v3 06/24] KVM: x86/mmu: Disallow page merging (huge page
 adjustment) for mirror root

On Tue, Jan 27, 2026, Yan Zhao wrote:
> On Mon, Jan 26, 2026 at 08:08:31AM -0800, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 321dbde77d3f..0fe3be41594f 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -1232,7 +1232,17 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >  	for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
> >  		int r;
> >  
> > -		if (fault->nx_huge_page_workaround_enabled)
> > +		/*
> > +		 * Don't replace a page table (non-leaf) SPTE with a huge SPTE
> > +		 * (a.k.a. hugepage promotion) if the NX hugepage workaround is
> > +		 * enabled, as doing so will cause significant thrashing if one
> > +		 * or more leaf SPTEs needs to be executable.
> > +		 *
> > +		 * Disallow hugepage promotion for mirror roots as KVM doesn't
> > +		 * (yet) support promoting S-EPT entries while holding mmu_lock
> > +		 * for read (due to complexity induced by the TDX-Module APIs).
> > +		 */
> > +		if (fault->nx_huge_page_workaround_enabled || is_mirror_sp(root))
> A small nit:
> Here, we check is_mirror_sp(root).
> However, not far from here,  in kvm_tdp_mmu_map(), we have another check of
> is_mirror_sp(), which should get the same result since sp->role.is_mirror is
> inherited from its parent.
> 
>                if (is_mirror_sp(sp))
>                        kvm_mmu_alloc_external_spt(vcpu, sp);
> 
> So, do you think we can save the is_mirror status in a local variable?

Eh, I vote "no".  From a performance perspective, it's basically meaningless.
The check is a single uop to test a flag that is all but guaranteed to be
cache-hot, and any halfway decent CPU be able to predict the branch.

>From a code perspective, I'd rather have the explicit is_mirror_sp(root) check,
as opposed to having to go look at the origins of is_mirror.

> Like this:
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index b524b44733b8..c54befec3042 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1300,6 +1300,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
>  int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>  {
>         struct kvm_mmu_page *root = tdp_mmu_get_root_for_fault(vcpu, fault);
> +       bool is_mirror = root && is_mirror_sp(root);
>         struct kvm *kvm = vcpu->kvm;
>         struct tdp_iter iter;
>         struct kvm_mmu_page *sp;
> @@ -1316,7 +1317,17 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>         for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
>                 int r;
> 
> -               if (fault->nx_huge_page_workaround_enabled)
> +               /*
> +                * Don't replace a page table (non-leaf) SPTE with a huge SPTE
> +                * (a.k.a. hugepage promotion) if the NX hugepage workaround is
> +                * enabled, as doing so will cause significant thrashing if one
> +                * or more leaf SPTEs needs to be executable.
> +                *
> +                * Disallow hugepage promotion for mirror roots as KVM doesn't
> +                * (yet) support promoting S-EPT entries while holding mmu_lock
> +                * for read (due to complexity induced by the TDX-Module APIs).
> +                */
> +               if (fault->nx_huge_page_workaround_enabled || is_mirror)
>                         disallowed_hugepage_adjust(fault, iter.old_spte, iter.level);
> 
>                 /*
> @@ -1340,7 +1351,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                  */
>                 sp = tdp_mmu_alloc_sp(vcpu);
>                 tdp_mmu_init_child_sp(sp, &iter);
> -               if (is_mirror_sp(sp))
> +               if (is_mirror)
>                         kvm_mmu_alloc_external_spt(vcpu, sp);
>