linux-kernel - Re: [PATCH v2] KVM: x86/mmu: Retry fault before acquiring mmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZaFpdMNrTeo1uDJP@google.com>
Date: Fri, 12 Jan 2024 08:31:48 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yuan Yao <yuan.yao@...ux.intel.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Yan Zhao <yan.y.zhao@...el.com>, Kai Huang <kai.huang@...el.com>
Subject: Re: [PATCH v2] KVM: x86/mmu: Retry fault before acquiring mmu_lock if
 mapping is changing

On Fri, Jan 12, 2024, Yuan Yao wrote:
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 3c844e428684..92f51540c4a7 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4415,6 +4415,22 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> >  	if (unlikely(!fault->slot))
> >  		return kvm_handle_noslot_fault(vcpu, fault, access);
> >
> > +	/*
> > +	 * Pre-check for a relevant mmu_notifier invalidation event prior to
> > +	 * acquiring mmu_lock.  If there is an in-progress invalidation and the
> > +	 * kernel allows preemption, the invalidation task may drop mmu_lock
> > +	 * and yield in response to mmu_lock being contended, which is *very*
> > +	 * counter-productive as this vCPU can't actually make forward progress
> > +	 * until the invalidation completes.  This "unsafe" check can get false
> > +	 * negatives, i.e. KVM needs to re-check after acquiring mmu_lock.  Do
> > +	 * the pre-check even for non-preemtible kernels, i.e. even if KVM will
> > +	 * never yield mmu_lock in response to contention, as this vCPU ob
> > +	 * *guaranteed* to need to retry, i.e. waiting until mmu_lock is held
> > +	 * to detect retry guarantees the worst case latency for the vCPU.
> > +	 */
> > +	if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
> > +		return RET_PF_RETRY;
> 
> This breaks the contract of kvm_faultin_pfn(), i.e. the pfn's refcount
> increased after resolved from gfn, but its caller won't decrease it.

Oof, good catch.

> How about call kvm_release_pfn_clean() just before return RET_PF_RETRY here,
> so we don't need to duplicate it in 3 different places.

Hrm, yeah, that does seem to be the best option.  Thanks!

> > +
> >  	return RET_PF_CONTINUE;
> >  }
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 7e7fd25b09b3..179df96b20f8 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -2031,6 +2031,32 @@ static inline int mmu_invalidate_retry_gfn(struct kvm *kvm,
> >  		return 1;
> >  	return 0;
> >  }
> > +
> > +/*
> > + * This lockless version of the range-based retry check *must* be paired with a
> 
> s/lockess/lockless

Heh, unless mine eyes deceive me, that's what I wrote :-)