linux-kernel - Re: [PATCH v2] KVM: x86/mmu: Retry fault before acquiring mmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZaoNrmgl+DUYdhGk@yilunxu-OptiPlex-7050>
Date: Fri, 19 Jan 2024 13:50:38 +0800
From: Xu Yilun <yilun.xu@...ux.intel.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org, Yan Zhao <yan.y.zhao@...el.com>,
	Kai Huang <kai.huang@...el.com>
Subject: Re: [PATCH v2] KVM: x86/mmu: Retry fault before acquiring mmu_lock
 if mapping is changing

On Thu, Jan 18, 2024 at 09:22:37AM -0800, Sean Christopherson wrote:
> On Fri, Jan 19, 2024, Xu Yilun wrote:
> > On Tue, Jan 09, 2024 at 05:20:45PM -0800, Sean Christopherson wrote:
> > > Retry page faults without acquiring mmu_lock if the resolved gfn is covered
> > > by an active invalidation.  Contending for mmu_lock is especially
> > > problematic on preemptible kernels as the mmu_notifier invalidation task
> > > will yield mmu_lock (see rwlock_needbreak()), delay the in-progress
> > 
> > Is it possible fault-in task avoids contending mmu_lock by using _trylock()?
> > Like:
> > 
> > 	while (!read_trylock(&vcpu->kvm->mmu_lock))
> > 		cpu_relax();
> > 
> > 	if (is_page_fault_stale(vcpu, fault))
> > 		goto out_unlock;
> >   
> > 	r = kvm_tdp_mmu_map(vcpu, fault);
> > 
> > out_unlock:
> > 	read_unlock(&vcpu->kvm->mmu_lock)
> 
> It's definitely possible, but the downsides far outweigh any potential benefits.
> 
> Doing trylock means the CPU is NOT put into the queue for acquiring the lock,
> which means that forward progress isn't guaranteed.  E.g. in a pathological
> scenario (and by "pathological", I mean if NUMA balancing or KSM is active ;-)),
> it's entirely possible for a near-endless stream of mmu_lock writers to be in
> the queue, thus preventing the vCPU from acquiring mmu_lock in a timely manner.

Ah yes, I forgot the main purpose of yielding is to let vCPU make
forward progress when the fault-in page is not covered by the invalidation.

Thanks,
Yilun

> 
> And hacking the page fault path to bypass KVM's lock contention detection would
> be a very willful, deliberate violation of the intent of the MMU's yielding logic
> for preemptible kernels.
> 
> That said, I would love to revisit KVM's use of rwlock_needbreak(), at least in
> the TDP MMU.  As evidenced by this rash of issues, it's not at all obvious that
> yielding on mmu_lock contention is *ever* a net positive for KVM, or even for the
> kernel.  The shadow MMU is probably a different story since it can't parallelize
> page faults with other operations, e.g. yielding in kvm_zap_obsolete_pages() to
> allow vCPUs to make forward progress is probably a net positive.
> 
> But AFAIK, no one has done any testing to prove that yielding on contention in
> the TDP MMU is actually a good thing.  I'm 99% certain the only reason the TDP
> MMU yields on contention is because the shadow MMU yields on contention, i.e.
> I'm confident that no one ever did performance testing to shadow that there is
> any benefit whatsoever to yielding mmu_lock in the TDP MMU.


>