linux-kernel - Re: [PATCH v2 4/6] KVM: x86/mmu: Fix wrong start gfn of tlb flushing with range

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220913125028.GB113257@k08j02272.eu95sqa>
Date:   Tue, 13 Sep 2022 20:50:28 +0800
From:   "Hou Wenlong" <houwenlong.hwl@...group.com>
To:     David Matlack <dmatlack@...gle.com>
Cc:     kvm@...r.kernel.org, Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>,
        Lan Tianyu <Tianyu.Lan@...rosoft.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 4/6] KVM: x86/mmu: Fix wrong start gfn of tlb flushing
 with range

On Thu, Sep 08, 2022 at 02:25:42AM +0800, David Matlack wrote:
> On Wed, Aug 24, 2022 at 05:29:21PM +0800, Hou Wenlong wrote:
> > When a spte is dropped, the start gfn of tlb flushing should
> > be the gfn of spte not the base gfn of SP which contains the
> > spte. Also introduce a helper function to do range-based
> > flushing when a spte is dropped, which would help prevent
> > future buggy use of kvm_flush_remote_tlbs_with_address() in
> > such case.
> > 
> > Fixes: c3134ce240eed ("KVM: Replace old tlb flush function with new one to flush a specified range.")
> > Suggested-by: David Matlack <dmatlack@...gle.com>
> > Signed-off-by: Hou Wenlong <houwenlong.hwl@...group.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c         | 20 +++++++++++++++-----
> >  arch/x86/kvm/mmu/paging_tmpl.h |  3 +--
> >  2 files changed, 16 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 3bcff56df109..e0b9432b9491 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -260,6 +260,18 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
> >  	kvm_flush_remote_tlbs_with_range(kvm, &range);
> >  }
> >  
> > +static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index);
> > +
> > +/* Flush the range of guest memory mapped by the given SPTE. */
> > +static void kvm_flush_remote_tlbs_sptep(struct kvm *kvm, u64 *sptep)
> > +{
> > +	struct kvm_mmu_page *sp = sptep_to_sp(sptep);
> > +	gfn_t gfn = kvm_mmu_page_get_gfn(sp, spte_index(sptep));
> > +
> > +	kvm_flush_remote_tlbs_with_address(kvm, gfn,
> > +					   KVM_PAGES_PER_HPAGE(sp->role.level));
> 
> How is the range-based TLB flushing supposed to work with indirect MMUs?
> When KVM is using shadow paging, the gfn here is not part of the actual
> translation.
> 
> For example, when TDP is disabled, KVM's shadow page tables translate
> GVA to HPA. When Nested Virtualization is in use and running L2, KVM's
> shadow page tables translate nGPA to HPA.
> 
> Ah, I see x86_ops.tlb_remote_flush_with_range is only set when running
> on Hyper-V and TDP is enabled (VMX checks enable_ept and SVM checks
> npt_enabled). But it looks like the nested case might still be broken?
>
Yeah, range based tlb flushing is only used on Hyper-V as Paolo said.
Actually, I don't know how Hyper-V implements the range based tlb
flushing hypercall, since KVM on Hyper-V is already nested, so gfn here
is already nGPA. It seems that the current used TDP root is passed in
hypercall, so maybe Hyper-V could do nGPA to HPA translation by looking
up TDP page table. Then for nested case, it chould work well?

> > +}
> > +
> >  /* Flush all memory mapped by the given direct SP. */
> >  static void kvm_flush_remote_tlbs_direct_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> >  {
> > @@ -1156,8 +1168,7 @@ static void drop_large_spte(struct kvm *kvm, u64 *sptep, bool flush)
> >  	drop_spte(kvm, sptep);
> >  
> >  	if (flush)
> > -		kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> > -			KVM_PAGES_PER_HPAGE(sp->role.level));
> > +		kvm_flush_remote_tlbs_sptep(kvm, sptep);
> >  }
> >  
> >  /*
> > @@ -1608,7 +1619,7 @@ static void __rmap_add(struct kvm *kvm,
> >  	if (rmap_count > RMAP_RECYCLE_THRESHOLD) {
> >  		kvm_zap_all_rmap_sptes(kvm, rmap_head);
> >  		kvm_flush_remote_tlbs_with_address(
> > -				kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
> > +				kvm, gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
> >  	}
> >  }
> >  
> > @@ -6402,8 +6413,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
> >  			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);
> >  
> >  			if (kvm_available_flush_tlb_with_range())
> > -				kvm_flush_remote_tlbs_with_address(kvm, sp->gfn,
> > -					KVM_PAGES_PER_HPAGE(sp->role.level));
> > +				kvm_flush_remote_tlbs_sptep(kvm, sptep);
> >  			else
> >  				need_tlb_flush = 1;
> >  
> > diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> > index 39e0205e7300..04149c704d5b 100644
> > --- a/arch/x86/kvm/mmu/paging_tmpl.h
> > +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> > @@ -937,8 +937,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
> >  
> >  			mmu_page_zap_pte(vcpu->kvm, sp, sptep, NULL);
> >  			if (is_shadow_present_pte(old_spte))
> > -				kvm_flush_remote_tlbs_with_address(vcpu->kvm,
> > -					sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level));
> > +				kvm_flush_remote_tlbs_sptep(vcpu->kvm, sptep);
> >  
> >  			if (!rmap_can_add(vcpu))
> >  				break;
> > -- 
> > 2.31.1
> >