[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCxVbq9KqAMInXNo@yzhao56-desk.sh.intel.com>
Date: Tue, 20 May 2025 18:11:58 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Li, Xiaoyao"
<xiaoyao.li@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Hansen,
Dave" <dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>, "Li,
Zhiquan1" <zhiquan1.li@...el.com>, "thomas.lendacky@....com"
<thomas.lendacky@....com>, "tabba@...gle.com" <tabba@...gle.com>, "Du, Fan"
<fan.du@...el.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
"Weiny, Ira" <ira.weiny@...el.com>, "pbonzini@...hat.com"
<pbonzini@...hat.com>, "binbin.wu@...ux.intel.com"
<binbin.wu@...ux.intel.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
"michael.roth@....com" <michael.roth@....com>, "vbabka@...e.cz"
<vbabka@...e.cz>, "ackerleytng@...gle.com" <ackerleytng@...gle.com>, "Peng,
Chao P" <chao.p.peng@...el.com>, "Shutemov, Kirill"
<kirill.shutemov@...el.com>, "Annapurve, Vishal" <vannapurve@...gle.com>,
"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>,
"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 10/21] KVM: x86/mmu: Disallow page merging (huge page
adjustment) for mirror root
On Tue, May 20, 2025 at 01:42:20AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2025-05-19 at 11:57 +0800, Yan Zhao wrote:
> > Like below?
> >
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 1b2bacde009f..0e4a03f44036 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -1275,6 +1275,11 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
> > return 0;
> > }
> >
> > +static inline bool is_fault_disallow_huge_page_adust(struct kvm_page_fault *fault, bool is_mirror)
> > +{
> > + return fault->nx_huge_page_workaround_enabled || is_mirror;
> > +}
>
> Err, no. It doesn't seem worth it.
>
> > +
> > /*
> > * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing
> > * page tables and SPTEs to translate the faulting guest physical address.
> > @@ -1297,7 +1302,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> > for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
> > int r;
> >
> > - if (fault->nx_huge_page_workaround_enabled || is_mirror)
> > + if (is_fault_disallow_huge_page_adust(fault, is_mirror))
> > disallowed_hugepage_adjust(fault, iter.old_spte, iter.level, is_mirror);
> >
> > /*
> >
> >
> >
> > > Also, why not check is_mirror_sp() in disallowed_hugepage_adjust() instead of
> > > passing in an is_mirror arg?
> > It's an optimization.
>
> But disallowed_hugepage_adjust() is already checking the sp.
>
> I think part of the thing that is bugging me is that
> nx_huge_page_workaround_enabled is not conceptually about whether the specific
> fault/level needs to disallow huge page adjustments, it's whether it needs to
> check if it does. Then disallowed_hugepage_adjust() does the actual specific
> checking. But for the mirror logic the check is the same for both. It's
> asymmetric with NX huge pages, and just sort of jammed in. It would be easier to
> follow if the kvm_tdp_mmu_map() conditional checked wither mirror TDP was
> "active", rather than the mirror role.
You are right. It looks clearer.
> >
> > As is_mirror_sptep(iter->sptep) == is_mirror_sp(root), passing in is_mirror arg
> > can avoid checking mirror for each sp, which remains unchanged in a root.
>
> Why not just this. It seems easier to comprehend to me. It does add a little bit
> of extra checking in the shared fault for TDX only. I think it's ok and better
> not to litter the generic MMU code.
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a284dce227a0..37ca77f2ee15 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3328,11 +3328,13 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu,
> struct kvm_page_fault *fault
>
> void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int
> cur_level)
> {
> + struct kvm_mmu_page * sp = spte_to_child_sp(spte);
if !is_shadow_present_pte(spte) or spte is an leaf entry, it's incorrect to
retrieve child sp. So, maybe
- spte_to_child_sp(spte)->nx_huge_page_disallowed) {
+ (spte_to_child_sp(spte)->nx_huge_page_disallowed &&
+ is_mirror_sp(spte_to_child_sp(spte))) {
Other changes look good to me.
> +
> if (cur_level > PG_LEVEL_4K &&
> cur_level == fault->goal_level &&
> is_shadow_present_pte(spte) &&
> !is_large_pte(spte) &&
> - spte_to_child_sp(spte)->nx_huge_page_disallowed) {
> + (sp->nx_huge_page_disallowed || sp->role.is_mirror)) {
> /*
> * A small SPTE exists for this pfn, but FNAME(fetch),
> * direct_map(), or kvm_tdp_mmu_map() would like to create a
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 405874f4d088..1d22994576b5 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1244,6 +1244,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct
> kvm_page_fault *fault)
> struct tdp_iter iter;
> struct kvm_mmu_page *sp;
> int ret = RET_PF_RETRY;
> + bool hugepage_adjust_disallowed = fault->nx_huge_page_workaround_enabled
> ||
> + kvm_has_mirrored_tdp(kvm);
>
> kvm_mmu_hugepage_adjust(vcpu, fault);
>
> @@ -1254,7 +1256,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct
> kvm_page_fault *fault)
> for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
> int r;
>
> - if (fault->nx_huge_page_workaround_enabled)
> + if (hugepage_adjust_disallowed)
> disallowed_hugepage_adjust(fault, iter.old_spte,
> iter.level);
>
> /*
>
>
> /*
>
Powered by blists - more mailing lists