linux-kernel - Re: [PATCH 15/22] KVM: x86/mmu: Move event re-injection unprotect+retry into common path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zsz5Furdqs2ys1Ps@google.com>
Date: Mon, 26 Aug 2024 14:52:22 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Peter Gonda <pgonda@...gle.com>, Michael Roth <michael.roth@....com>, 
	Vishal Annapurve <vannapurve@...gle.com>, Ackerly Tng <ackerleytng@...gle.com>
Subject: Re: [PATCH 15/22] KVM: x86/mmu: Move event re-injection
 unprotect+retry into common path

On Wed, Aug 14, 2024, Paolo Bonzini wrote:
> On 8/9/24 21:03, Sean Christopherson wrote:
> > @@ -6037,8 +6018,15 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> >   	 * execute the instruction.  If no shadow pages were zapped, then the
> >   	 * write-fault is due to something else entirely, i.e. KVM needs to
> >   	 * emulate, as resuming the guest will put it into an infinite loop.
> > +	 *
> > +	 * For indirect MMUs, i.e. if KVM is shadowing the current MMU, try to
> > +	 * unprotect the gfn and retry if an event is awaiting reinjection.  If
> > +	 * KVM emulates multiple instructions before completing even injection,
> > +	 * the event could be delayed beyond what is architecturally allowed,
> > +	 * e.g. KVM could inject an IRQ after the TPR has been raised.
> 
> This paragraph should go before the description of
> kvm_mmu_unprotect_gfn_and_retry:

Hmm, I disagree.  The comment ends up being disconnected from the code, especially
by the end of the series.  E.g. when reading kvm_mmu_write_protect_fault(), someone
would have to jump twice (to kvm_mmu_unprotect_gfn_and_retry and then
__kvm_mmu_unprotect_gfn_and_retry()) in order to understand the checks buried
in kvm_mmu_write_protect_fault().

And the comment also becomes stale when kvm_mmu_unprotect_gfn_and_retry() is
used by x86_emulate_instruction().  That's obviously solvable by extending the
function comment, but then we end up with a rather massive function comment that
documents its callers, not the function itself.

> 
> 	 * There are two cases in which we try to unprotect the page here
> 	 * preemptively, i.e. zap any shadow pages, before emulating the
> 	 * instruction.
> 	 *
> 	 * First, the access may be due to L1 accessing nested NPT/EPT entries
> 	 * used for L2, i.e. if the gfn being written is for gPTEs that KVM is
> 	 * shadowing and has write-protected.  In this case, because AMD CPUs
> 	 * walk nested page table using a write operation, walking NPT entries
> 	 * in L1 can trigger write faults even when L1 isn't modifying PTEs.
> 	 * KVM would then emulate an excessive number of L1 instructions
> 	 * without triggering KVM's write-flooding detection, i.e. without
> 	 * unprotecting the gfn.  This is detected as a RO violation while
> 	 * translating the guest page when the current MMU is direct.
> 	 *
> 	 * Second, for indirect MMUs, i.e. if KVM is shadowing the current MMU,
> 	 * unprotect the gfn and reenter the guest if an event is awaiting
> 	 * reinjection.  If KVM emulates multiple instructions before completing
> 	 * event injection, the event could be delayed beyond what is
> 	 * architecturally allowed, e.g. KVM could inject an IRQ after the
> 	 * TPR has been raised.
> 	 *
> 	 * In both cases, if one or more shadow pages were zapped, skip
> 	 * emulation and resume L1 to let it natively execute the instruction.
> 	 * If no shadow pages were zapped, then the write-fault is due to
> 	 * something else entirely and KVM needs to emulate, as resuming
> 	 * the guest will put it into an infinite loop.
> 
> Thanks,
> 
> Paolo
> 
> >   	 */
> > -	if (direct && (is_write_to_guest_page_table(error_code)) &&
> > +	if (((direct && is_write_to_guest_page_table(error_code)) ||
> > +	     (!direct && kvm_event_needs_reinjection(vcpu))) &&
> >   	    kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa))
> >   		return RET_PF_FIXED;
>