linux-kernel - Re: [PATCH 04/22] KVM: x86/mmu: Skip emulation on page fault iff 1+ SPs were unprotected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6fsgci4fceoin7fp3ejeulbaybaitx3yo3nylzecanoba5gvhd@3ubrvlykgonn>
Date: Fri, 16 Aug 2024 07:31:19 +0800
From: Yao Yuan <yaoyuan0329os@...il.com>
To: Yuan Yao <yuan.yao@...ux.intel.com>
Cc: Sean Christopherson <seanjc@...gle.com>, 
	Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Peter Gonda <pgonda@...gle.com>, Michael Roth <michael.roth@....com>, 
	Vishal Annapurve <vannapurve@...gle.com>, Ackerly Tng <ackerleytng@...gle.com>
Subject: Re: [PATCH 04/22] KVM: x86/mmu: Skip emulation on page fault iff 1+
 SPs were unprotected

On Wed, Aug 14, 2024 at 10:22:56PM GMT, Yuan Yao wrote:
> On Fri, Aug 09, 2024 at 12:03:01PM -0700, Sean Christopherson wrote:
> > When doing "fast unprotection" of nested TDP page tables, skip emulation
> > if and only if at least one gfn was unprotected, i.e. continue with
> > emulation if simply resuming is likely to hit the same fault and risk
> > putting the vCPU into an infinite loop.
> >
> > Note, it's entirely possible to get a false negative, e.g. if a different
> > vCPU faults on the same gfn and unprotects the gfn first, but that's a
> > relatively rare edge case, and emulating is still functionally ok, i.e.
> > the risk of putting the vCPU isn't an infinite loop isn't justified.
> >
> > Fixes: 147277540bbc ("kvm: svm: Add support for additional SVM NPF error codes")
> > Cc: stable@...r.kernel.org
> > Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 28 ++++++++++++++++++++--------
> >  1 file changed, 20 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index e3aa04c498ea..95058ac4b78c 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5967,17 +5967,29 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> >  	bool direct = vcpu->arch.mmu->root_role.direct;
> >
> >  	/*
> > -	 * Before emulating the instruction, check if the error code
> > -	 * was due to a RO violation while translating the guest page.
> > -	 * This can occur when using nested virtualization with nested
> > -	 * paging in both guests. If true, we simply unprotect the page
> > -	 * and resume the guest.
> > +	 * Before emulating the instruction, check to see if the access may be
> > +	 * due to L1 accessing nested NPT/EPT entries used for L2, i.e. if the
> > +	 * gfn being written is for gPTEs that KVM is shadowing and has write-
> > +	 * protected.  Because AMD CPUs walk nested page table using a write

Hi Sean,

I Just want to consult how often of this on EPT:

The PFERR_GUEST_PAGE_MASK is set when EPT violation happens
in middle of walking the guest CR3 page table, and the guest
CR3 page table page is write-protected on EPT01, are these
guest CR3 page table pages also are EPT12 page table pages
often?  I just think most of time they should be data page
on guest CR3 table for L1 to access them by L1 GVA, if so
the PFERR_GUEST_FINAL_MASK should be set but not
PFERR_GUEST_PAGE_MASK.

> > +	 * operation, walking NPT entries in L1 can trigger write faults even
> > +	 * when L1 isn't modifying PTEs, and thus result in KVM emulating an
> > +	 * excessive number of L1 instructions without triggering KVM's write-
> > +	 * flooding detection, i.e. without unprotecting the gfn.
> > +	 *
> > +	 * If the error code was due to a RO violation while translating the
> > +	 * guest page, the current MMU is direct (L1 is active), and KVM has
> > +	 * shadow pages, then the above scenario is likely being hit.  Try to
> > +	 * unprotect the gfn, i.e. zap any shadow pages, so that L1 can walk
> > +	 * its NPT entries without triggering emulation.  If one or more shadow
> > +	 * pages was zapped, skip emulation and resume L1 to let it natively
> > +	 * execute the instruction.  If no shadow pages were zapped, then the
> > +	 * write-fault is due to something else entirely, i.e. KVM needs to
> > +	 * emulate, as resuming the guest will put it into an infinite loop.
> >  	 */
>
> Reviewed-by: Yuan Yao <yuan.yao@...el.com>
>
> >  	if (direct &&
> > -	    (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
> > -		kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
> > +	    (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE &&
> > +	    kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)))
> >  		return RET_PF_FIXED;
> > -	}
> >
> >  	/*
> >  	 * The gfn is write-protected, but if emulation fails we can still
> > --
> > 2.46.0.76.ge559c4bf1a-goog
> >
> >
>