linux-kernel - Re: [RFC PATCH 10/21] KVM: x86/mmu: Disallow page merging (huge page adjustment) for mirror root

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aCqsDW6bDlM6yOtP@yzhao56-desk.sh.intel.com>
Date: Mon, 19 May 2025 11:57:01 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "quic_eberman@...cinc.com"
	<quic_eberman@...cinc.com>, "Hansen, Dave" <dave.hansen@...el.com>,
	"david@...hat.com" <david@...hat.com>, "Li, Zhiquan1"
	<zhiquan1.li@...el.com>, "tabba@...gle.com" <tabba@...gle.com>,
	"vbabka@...e.cz" <vbabka@...e.cz>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "michael.roth@....com" <michael.roth@....com>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "Weiny, Ira" <ira.weiny@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "Peng, Chao P"
	<chao.p.peng@...el.com>, "Du, Fan" <fan.du@...el.com>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun"
	<jun.miao@...el.com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 10/21] KVM: x86/mmu: Disallow page merging (huge page
 adjustment) for mirror root

On Sat, May 17, 2025 at 01:50:53AM +0800, Edgecombe, Rick P wrote:
> On Fri, 2025-05-16 at 12:01 +0800, Yan Zhao wrote:
> > > Maybe we should rename nx_huge_page_workaround_enabled to something more
> > > generic
> > > and do the is_mirror logic in kvm_mmu_do_page_fault() when setting it. It
> > > should
> > > shrink the diff and centralize the logic.
> > Hmm, I'm reluctant to rename nx_huge_page_workaround_enabled, because
> > 
> > (1) Invoking disallowed_hugepage_adjust() for mirror root is to disable page
> >     promotion for TDX private memory, so is only applied to TDP MMU.
> > (2) nx_huge_page_workaround_enabled is used specifically for nx huge pages.
> >     fault->huge_page_disallowed = fault->exec && fault-
> > >nx_huge_page_workaround_enabled;
> 
> Oh, good point.
> 
> > 
> >     if (fault->huge_page_disallowed)
> >         account_nx_huge_page(vcpu->kvm, sp,
> >                              fault->req_level >= it.level);
> >     
> >     sp->nx_huge_page_disallowed = fault->huge_page_disallowed.
> > 
> >     Affecting fault->huge_page_disallowed would impact
> >     sp->nx_huge_page_disallowed as well and would disable huge pages entirely.
> > 
> >     So, we still need to keep nx_huge_page_workaround_enabled.
> > 
> > If we introduce a new flag fault->disable_hugepage_adjust, and set it in
> > kvm_mmu_do_page_fault(), we would also need to invoke
> > tdp_mmu_get_root_for_fault() there as well.
> > 
> > Checking for mirror root for non-TDX VMs is not necessary, and the invocation
> > of
> > tdp_mmu_get_root_for_fault() seems redundant with the one in
> > kvm_tdp_mmu_map().
> 
> Also true. What about a wrapper for MMU code to check instead of fault-
> >nx_huge_page_workaround_enabled then?
Like below?

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1b2bacde009f..0e4a03f44036 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1275,6 +1275,11 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
        return 0;
 }

+static inline bool is_fault_disallow_huge_page_adust(struct kvm_page_fault *fault, bool is_mirror)
+{
+       return fault->nx_huge_page_workaround_enabled || is_mirror;
+}
+
 /*
  * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing
  * page tables and SPTEs to translate the faulting guest physical address.
@@ -1297,7 +1302,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
        for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
                int r;

-               if (fault->nx_huge_page_workaround_enabled || is_mirror)
+               if (is_fault_disallow_huge_page_adust(fault, is_mirror))
                        disallowed_hugepage_adjust(fault, iter.old_spte, iter.level, is_mirror);

                /*



> Also, why not check is_mirror_sp() in disallowed_hugepage_adjust() instead of
> passing in an is_mirror arg?
It's an optimization.

As is_mirror_sptep(iter->sptep) == is_mirror_sp(root), passing in is_mirror arg
can avoid checking mirror for each sp, which remains unchanged in a root.


> There must be a way to have it fit in better with disallowed_hugepage_adjust()
> without adding so much open coded boolean logic.