linux-kernel - Re: [PATCH 4/4] KVM: x86/mmu: Free obsolete roots when pre-faulting SPTEs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6riQMB9zsyCER65@yzhao56-desk.sh.intel.com>
Date: Tue, 11 Feb 2025 13:38:08 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <pbonzini@...hat.com>, <rick.p.edgecombe@...el.com>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>
Subject: Re: [PATCH 4/4] KVM: x86/mmu: Free obsolete roots when pre-faulting
 SPTEs

On Mon, Feb 10, 2025 at 02:41:40PM -0800, Sean Christopherson wrote:
> On Sat, Feb 08, 2025, Yan Zhao wrote:
> > On Fri, Feb 07, 2025 at 07:12:04AM -0800, Sean Christopherson wrote:
> > > On Fri, Feb 07, 2025, Yan Zhao wrote:
> > > > Always free obsolete roots when pre-faulting SPTEs in case it's called
> > > > after a root is invalidated (e.g., by memslot removal) but before any
> > > > vcpu_enter_guest() processing of KVM_REQ_MMU_FREE_OBSOLETE_ROOTS.
> > > > 
> > > > Lack of kvm_mmu_free_obsolete_roots() in this scenario can lead to
> > > > kvm_mmu_reload() failing to load a new root if the current root hpa is an
> > > > obsolete root (which is not INVALID_PAGE). Consequently,
> > > > kvm_arch_vcpu_pre_fault_memory() will retry infinitely due to the checking
> > > > of is_page_fault_stale().
> > > > 
> > > > It's safe to call kvm_mmu_free_obsolete_roots() even if there are no
> > > > obsolete roots or if it's called a second time when vcpu_enter_guest()
> > > > later processes KVM_REQ_MMU_FREE_OBSOLETE_ROOTS. This is because
> > > > kvm_mmu_free_obsolete_roots() sets an obsolete root to INVALID_PAGE and
> > > > will do nothing to an INVALID_PAGE.
> > > 
> > > Why is userspace changing memslots while prefaulting?
> > It currently only exists in the kvm selftest (written by myself...)
> > Not sure if there's any real use case like this.
> 
> It's decidedly odd.  I asked, because maybe there's a way we can disallow the
> scenario.  Doing that without making things more complex than simply handling
> obsolete roots is probably a fool's errand though.
Hmm, maybe it's not wise to do it like this.
But it also looks not desired to disallow slot changes between two prefault
ioctls.

Rather than always prefaulting after memslots are finalized, not sure if any
userspace implementation would follow a pattern like
"add a memslot, perform a prefault". Then if a memslot is removed somehow
between two "add and prefault", the second prefault ioctl would get stuck.

> 
> > > > Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> > > > ---
> > > >  arch/x86/kvm/mmu/mmu.c | 5 +++++
> > > >  1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > > index 47fd3712afe6..72f68458049a 100644
> > > > --- a/arch/x86/kvm/mmu/mmu.c
> > > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > > @@ -4740,7 +4740,12 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> > > >  	/*
> > > >  	 * reload is efficient when called repeatedly, so we can do it on
> > > >  	 * every iteration.
> > > > +	 * Before reload, free obsolete roots in case the prefault is called
> > > > +	 * after a root is invalidated (e.g., by memslot removal) but
> > > > +	 * before any vcpu_enter_guest() processing of
> > > > +	 * KVM_REQ_MMU_FREE_OBSOLETE_ROOTS.
> > > >  	 */
> > > > +	kvm_mmu_free_obsolete_roots(vcpu);
> > > >  	r = kvm_mmu_reload(vcpu);
> > > >  	if (r)
> > > >  		return r;
> > > 
> > > I would prefer to do check for obsolete roots in kvm_mmu_reload() itself, but
> > Yes, it's better!
> > I previously considered doing in this way, but I was afraid to introduce
> > overhead (the extra compare) to kvm_mmu_reload(), which is called quite
> > frequently.
> > 
> > But maybe we can remove the check in vcpu_enter_guest() to reduce the overhead?
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index b2d9a16fd4d3..6a1f2780a094 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10731,8 +10731,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> >                                 goto out;
> >                         }
> >                 }
> > -               if (kvm_check_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
> > -                       kvm_mmu_free_obsolete_roots(vcpu);
> >                 if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
> >                         __kvm_migrate_timers(vcpu);
> >                 if (kvm_check_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu))
> > 
> > > keep the main kvm_check_request() so that the common case handles the resulting
> > > TLB flush without having to loop back around in vcpu_enter_guest().
> > Hmm, I'm a little confused.
> > What's is the resulting TLB flush?
> 
> For the common case where KVM_REQ_MMU_FREE_OBSOLETE_ROOTS is pending before
> vcpu_enter_guest, kvm_mmu_free_obsolete_roots() may trigger KVM_REQ_TLB_FLUSH
> via kvm_mmu_commit_zap_page().  Processing KVM_REQ_MMU_FREE_OBSOLETE_ROOTS before
> KVM_REQ_TLB_FLUSH means vcpu_enter_guest() doesn't have to "abort" and redo the
> whole loop (the newly pending request won't be detected until kvm_vcpu_exit_request(),
> which isn't that late in the entry sequence, but there is a decent amount of work
> that needs to be undone).
Thanks a lot for such detailed explanation!

> On the other hand, the cost of kvm_check_request(), especially a check that's
> guarded by kvm_request_pending(), is negligible.
> 
> That said, obsolete roots shouldn't actually require a TLB flush.  E.g. the TDP
> MMU hasn't flushed invalid roots since commit fcdffe97f80e ("KVM: x86/mmu: Don't
> do TLB flush when zappings SPTEs in invalid roots").  I'd have to think more about
> whether or not that's safe/correct for the shadow MMU though.
For shadow MMU, I think it's safe to skip the remote TLB flush in
kvm_mmu_commit_zap_page() for an obsolete root, because kvm_mmu_zap_all_fast()
is the only place to toggle mmu_valid_gen. kvm_mmu_zap_all_fast() makes all
active sps obsolete after this toggle and kicks off all vCPUs after that. So,
the kvm_x86_call(flush_tlb_current)() is able to carry the TLB flush in each
vCPU if kvm_mmu_load() is invoked.

> For this case, I think it makes sense to just add the check in kvm_mmu_reload().
Got it!