[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <41d307b3-78d9-4be1-80f0-9a9652e7ee37@redhat.com>
Date: Wed, 14 Aug 2024 19:43:13 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Peter Gonda <pgonda@...gle.com>, Michael Roth <michael.roth@....com>,
Vishal Annapurve <vannapurve@...gle.com>,
Ackerly Tng <ackerleytng@...gle.com>
Subject: Re: [PATCH 15/22] KVM: x86/mmu: Move event re-injection
unprotect+retry into common path
On 8/9/24 21:03, Sean Christopherson wrote:
> Move the event re-injection unprotect+retry logic into
> kvm_mmu_write_protect_fault(), i.e. unprotect and retry if and only if
> the #PF actually hit a write-protected gfn. Note, there is a small
> possibility that the gfn was unprotected by a different tasking between
> hitting the #PF and acquiring mmu_lock, but in that case, KVM will resume
> the guest immediately anyways because KVM will treat the fault as spurious.
>
> As a bonus, unprotecting _after_ handling the page fault also addresses the
> case where the installing a SPTE to handle fault encounters a shadowed PTE,
> i.e. *creates* a read-only SPTE.
>
> Opportunstically add a comment explaining what on earth the intent of the
> code is, as based on the changelog from commit 577bdc496614 ("KVM: Avoid
> instruction emulation when event delivery is pending").
>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 28 ++++++++--------------------
> 1 file changed, 8 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f64ad36ca9e0..d3c0220ff7ee 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2753,23 +2753,6 @@ bool kvm_mmu_unprotect_gfn_and_retry(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa)
> return r;
> }
>
> -static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
> -{
> - gpa_t gpa;
> - int r;
> -
> - if (vcpu->arch.mmu->root_role.direct)
> - return 0;
> -
> - gpa = kvm_mmu_gva_to_gpa_write(vcpu, gva, NULL);
> - if (gpa == INVALID_GPA)
> - return 0;
> -
> - r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
> -
> - return r;
> -}
> -
> static void kvm_unsync_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> {
> trace_kvm_mmu_unsync_page(sp);
> @@ -4640,8 +4623,6 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> if (!flags) {
> trace_kvm_page_fault(vcpu, fault_address, error_code);
>
> - if (kvm_event_needs_reinjection(vcpu))
> - kvm_mmu_unprotect_page_virt(vcpu, fault_address);
> r = kvm_mmu_page_fault(vcpu, fault_address, error_code, insn,
> insn_len);
> } else if (flags & KVM_PV_REASON_PAGE_NOT_PRESENT) {
> @@ -6037,8 +6018,15 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> * execute the instruction. If no shadow pages were zapped, then the
> * write-fault is due to something else entirely, i.e. KVM needs to
> * emulate, as resuming the guest will put it into an infinite loop.
> + *
> + * For indirect MMUs, i.e. if KVM is shadowing the current MMU, try to
> + * unprotect the gfn and retry if an event is awaiting reinjection. If
> + * KVM emulates multiple instructions before completing even injection,
> + * the event could be delayed beyond what is architecturally allowed,
> + * e.g. KVM could inject an IRQ after the TPR has been raised.
This paragraph should go before the description of
kvm_mmu_unprotect_gfn_and_retry:
* There are two cases in which we try to unprotect the page here
* preemptively, i.e. zap any shadow pages, before emulating the
* instruction.
*
* First, the access may be due to L1 accessing nested NPT/EPT entries
* used for L2, i.e. if the gfn being written is for gPTEs that KVM is
* shadowing and has write-protected. In this case, because AMD CPUs
* walk nested page table using a write operation, walking NPT entries
* in L1 can trigger write faults even when L1 isn't modifying PTEs.
* KVM would then emulate an excessive number of L1 instructions
* without triggering KVM's write-flooding detection, i.e. without
* unprotecting the gfn. This is detected as a RO violation while
* translating the guest page when the current MMU is direct.
*
* Second, for indirect MMUs, i.e. if KVM is shadowing the current MMU,
* unprotect the gfn and reenter the guest if an event is awaiting
* reinjection. If KVM emulates multiple instructions before completing
* event injection, the event could be delayed beyond what is
* architecturally allowed, e.g. KVM could inject an IRQ after the
* TPR has been raised.
*
* In both cases, if one or more shadow pages were zapped, skip
* emulation and resume L1 to let it natively execute the instruction.
* If no shadow pages were zapped, then the write-fault is due to
* something else entirely and KVM needs to emulate, as resuming
* the guest will put it into an infinite loop.
Thanks,
Paolo
> */
> - if (direct && (is_write_to_guest_page_table(error_code)) &&
> + if (((direct && is_write_to_guest_page_table(error_code)) ||
> + (!direct && kvm_event_needs_reinjection(vcpu))) &&
> kvm_mmu_unprotect_gfn_and_retry(vcpu, cr2_or_gpa))
> return RET_PF_FIXED;
>
Powered by blists - more mailing lists