linux-kernel - Re: [RFC PATCH 10/21] KVM: x86/mmu: Disallow page merging (huge page adjustment) for mirror root

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eea0bf7925c3b9c16573be8e144ddcc77b54cc92.camel@intel.com>
Date: Mon, 19 May 2025 17:42:20 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "Zhao, Yan Y" <yan.y.zhao@...el.com>
CC: "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Hansen,
 Dave" <dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>, "Li,
 Zhiquan1" <zhiquan1.li@...el.com>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "tabba@...gle.com" <tabba@...gle.com>, "Du, Fan"
	<fan.du@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"Weiny, Ira" <ira.weiny@...el.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"michael.roth@....com" <michael.roth@....com>, "vbabka@...e.cz"
	<vbabka@...e.cz>, "ackerleytng@...gle.com" <ackerleytng@...gle.com>, "Peng,
 Chao P" <chao.p.peng@...el.com>, "Shutemov, Kirill"
	<kirill.shutemov@...el.com>, "Annapurve, Vishal" <vannapurve@...gle.com>,
	"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 10/21] KVM: x86/mmu: Disallow page merging (huge page
 adjustment) for mirror root

On Mon, 2025-05-19 at 11:57 +0800, Yan Zhao wrote:
> Like below?
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 1b2bacde009f..0e4a03f44036 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1275,6 +1275,11 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
>         return 0;
>  }
> 
> +static inline bool is_fault_disallow_huge_page_adust(struct kvm_page_fault *fault, bool is_mirror)
> +{
> +       return fault->nx_huge_page_workaround_enabled || is_mirror;
> +}

Err, no. It doesn't seem worth it.

> +
>  /*
>   * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing
>   * page tables and SPTEs to translate the faulting guest physical address.
> @@ -1297,7 +1302,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>         for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
>                 int r;
> 
> -               if (fault->nx_huge_page_workaround_enabled || is_mirror)
> +               if (is_fault_disallow_huge_page_adust(fault, is_mirror))
>                         disallowed_hugepage_adjust(fault, iter.old_spte, iter.level, is_mirror);
> 
>                 /*
> 
> 
> 
> > Also, why not check is_mirror_sp() in disallowed_hugepage_adjust() instead of
> > passing in an is_mirror arg?
> It's an optimization.

But disallowed_hugepage_adjust() is already checking the sp.

I think part of the thing that is bugging me is that
nx_huge_page_workaround_enabled is not conceptually about whether the specific
fault/level needs to disallow huge page adjustments, it's whether it needs to
check if it does. Then disallowed_hugepage_adjust() does the actual specific
checking. But for the mirror logic the check is the same for both. It's
asymmetric with NX huge pages, and just sort of jammed in. It would be easier to
follow if the kvm_tdp_mmu_map() conditional checked wither mirror TDP was
"active", rather than the mirror role.

> 
> As is_mirror_sptep(iter->sptep) == is_mirror_sp(root), passing in is_mirror arg
> can avoid checking mirror for each sp, which remains unchanged in a root.

Why not just this. It seems easier to comprehend to me. It does add a little bit
of extra checking in the shared fault for TDX only. I think it's ok and better
not to litter the generic MMU code.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a284dce227a0..37ca77f2ee15 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3328,11 +3328,13 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault
 
 void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int
cur_level)
 {
+       struct kvm_mmu_page * sp = spte_to_child_sp(spte);
+
        if (cur_level > PG_LEVEL_4K &&
            cur_level == fault->goal_level &&
            is_shadow_present_pte(spte) &&
            !is_large_pte(spte) &&
-           spte_to_child_sp(spte)->nx_huge_page_disallowed) {
+           (sp->nx_huge_page_disallowed || sp->role.is_mirror)) {
                /*
                 * A small SPTE exists for this pfn, but FNAME(fetch),
                 * direct_map(), or kvm_tdp_mmu_map() would like to create a
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 405874f4d088..1d22994576b5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1244,6 +1244,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct
kvm_page_fault *fault)
        struct tdp_iter iter;
        struct kvm_mmu_page *sp;
        int ret = RET_PF_RETRY;
+       bool hugepage_adjust_disallowed = fault->nx_huge_page_workaround_enabled
||
+                                         kvm_has_mirrored_tdp(kvm);
 
        kvm_mmu_hugepage_adjust(vcpu, fault);
 
@@ -1254,7 +1256,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct
kvm_page_fault *fault)
        for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
                int r;
 
-               if (fault->nx_huge_page_workaround_enabled)
+               if (hugepage_adjust_disallowed)
                        disallowed_hugepage_adjust(fault, iter.old_spte,
iter.level);
 
                /*

 
                /*