linux-kernel - Re: [RFC PATCH 10/21] KVM: x86/mmu: Disallow page merging (huge page adjustment) for mirror root

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCxVbq9KqAMInXNo@yzhao56-desk.sh.intel.com>
Date: Tue, 20 May 2025 18:11:58 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Hansen,
 Dave" <dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>, "Li,
 Zhiquan1" <zhiquan1.li@...el.com>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "tabba@...gle.com" <tabba@...gle.com>, "Du, Fan"
	<fan.du@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"Weiny, Ira" <ira.weiny@...el.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"michael.roth@....com" <michael.roth@....com>, "vbabka@...e.cz"
	<vbabka@...e.cz>, "ackerleytng@...gle.com" <ackerleytng@...gle.com>, "Peng,
 Chao P" <chao.p.peng@...el.com>, "Shutemov, Kirill"
	<kirill.shutemov@...el.com>, "Annapurve, Vishal" <vannapurve@...gle.com>,
	"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 10/21] KVM: x86/mmu: Disallow page merging (huge page
 adjustment) for mirror root

On Tue, May 20, 2025 at 01:42:20AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2025-05-19 at 11:57 +0800, Yan Zhao wrote:
> > Like below?
> > 
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 1b2bacde009f..0e4a03f44036 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -1275,6 +1275,11 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
> >         return 0;
> >  }
> > 
> > +static inline bool is_fault_disallow_huge_page_adust(struct kvm_page_fault *fault, bool is_mirror)
> > +{
> > +       return fault->nx_huge_page_workaround_enabled || is_mirror;
> > +}
> 
> Err, no. It doesn't seem worth it.
> 
> > +
> >  /*
> >   * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing
> >   * page tables and SPTEs to translate the faulting guest physical address.
> > @@ -1297,7 +1302,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >         for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
> >                 int r;
> > 
> > -               if (fault->nx_huge_page_workaround_enabled || is_mirror)
> > +               if (is_fault_disallow_huge_page_adust(fault, is_mirror))
> >                         disallowed_hugepage_adjust(fault, iter.old_spte, iter.level, is_mirror);
> > 
> >                 /*
> > 
> > 
> > 
> > > Also, why not check is_mirror_sp() in disallowed_hugepage_adjust() instead of
> > > passing in an is_mirror arg?
> > It's an optimization.
> 
> But disallowed_hugepage_adjust() is already checking the sp.
> 
> I think part of the thing that is bugging me is that
> nx_huge_page_workaround_enabled is not conceptually about whether the specific
> fault/level needs to disallow huge page adjustments, it's whether it needs to
> check if it does. Then disallowed_hugepage_adjust() does the actual specific
> checking. But for the mirror logic the check is the same for both. It's
> asymmetric with NX huge pages, and just sort of jammed in. It would be easier to
> follow if the kvm_tdp_mmu_map() conditional checked wither mirror TDP was
> "active", rather than the mirror role.
You are right. It looks clearer.

> > 
> > As is_mirror_sptep(iter->sptep) == is_mirror_sp(root), passing in is_mirror arg
> > can avoid checking mirror for each sp, which remains unchanged in a root.
> 
> Why not just this. It seems easier to comprehend to me. It does add a little bit
> of extra checking in the shared fault for TDX only. I think it's ok and better
> not to litter the generic MMU code.
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a284dce227a0..37ca77f2ee15 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3328,11 +3328,13 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu,
> struct kvm_page_fault *fault
>  
>  void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int
> cur_level)
>  {
> +       struct kvm_mmu_page * sp = spte_to_child_sp(spte);
if !is_shadow_present_pte(spte) or spte is an leaf entry, it's incorrect to
retrieve child sp. So, maybe

-           spte_to_child_sp(spte)->nx_huge_page_disallowed) {
+           (spte_to_child_sp(spte)->nx_huge_page_disallowed &&
+            is_mirror_sp(spte_to_child_sp(spte))) {

Other changes look good to me.

> +
>         if (cur_level > PG_LEVEL_4K &&
>             cur_level == fault->goal_level &&
>             is_shadow_present_pte(spte) &&
>             !is_large_pte(spte) &&
> -           spte_to_child_sp(spte)->nx_huge_page_disallowed) {
> +           (sp->nx_huge_page_disallowed || sp->role.is_mirror)) {
>                 /*
>                  * A small SPTE exists for this pfn, but FNAME(fetch),
>                  * direct_map(), or kvm_tdp_mmu_map() would like to create a
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 405874f4d088..1d22994576b5 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1244,6 +1244,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct
> kvm_page_fault *fault)
>         struct tdp_iter iter;
>         struct kvm_mmu_page *sp;
>         int ret = RET_PF_RETRY;
> +       bool hugepage_adjust_disallowed = fault->nx_huge_page_workaround_enabled
> ||
> +                                         kvm_has_mirrored_tdp(kvm);
>  
>         kvm_mmu_hugepage_adjust(vcpu, fault);
>  
> @@ -1254,7 +1256,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct
> kvm_page_fault *fault)
>         for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
>                 int r;
>  
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (hugepage_adjust_disallowed)
>                         disallowed_hugepage_adjust(fault, iter.old_spte,
> iter.level);
>  
>                 /*
> 
>  
>                 /*
>