[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zry9Us0HVEDmhCB4@google.com>
Date: Wed, 14 Aug 2024 07:21:06 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Yuan Yao <yuan.yao@...ux.intel.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Peter Gonda <pgonda@...gle.com>, Michael Roth <michael.roth@....com>,
Vishal Annapurve <vannapurve@...gle.com>, Ackerly Tng <ackerleytng@...gle.com>
Subject: Re: [PATCH 03/22] KVM: x86/mmu: Trigger unprotect logic only on
write-protection page faults
On Wed, Aug 14, 2024, Yuan Yao wrote:
> > @@ -5960,6 +5961,41 @@ void kvm_mmu_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
> > write_unlock(&vcpu->kvm->mmu_lock);
> > }
> >
> > +static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > + u64 error_code, int *emulation_type)
> > +{
> > + bool direct = vcpu->arch.mmu->root_role.direct;
> > +
> > + /*
> > + * Before emulating the instruction, check if the error code
> > + * was due to a RO violation while translating the guest page.
> > + * This can occur when using nested virtualization with nested
> > + * paging in both guests. If true, we simply unprotect the page
> > + * and resume the guest.
> > + */
> > + if (direct &&
> > + (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) {
> > + kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa));
> > + return RET_PF_FIXED;
> > + }
> > +
> > + /*
> > + * The gfn is write-protected, but if emulation fails we can still
> > + * optimistically try to just unprotect the page and let the processor
> > + * re-execute the instruction that caused the page fault. Do not allow
> > + * retrying MMIO emulation, as it's not only pointless but could also
> > + * cause us to enter an infinite loop because the processor will keep
> > + * faulting on the non-existent MMIO address. Retrying an instruction
> > + * from a nested guest is also pointless and dangerous as we are only
> > + * explicitly shadowing L1's page tables, i.e. unprotecting something
> > + * for L1 isn't going to magically fix whatever issue cause L2 to fail.
> > + */
> > + if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
>
> Looks the mmio_info_in_cache() checking can be removed,
> emulation should not come here with RET_PF_WRITE_PROTECTED
> introduced, may WARN_ON_ONCE() it.
Yeah, that was my instinct as well. I kept it mostly because I liked having the
comment, but also because I was thinking the cache could theoretically get a hit.
But that's not true. KVM should return RET_PF_WRITE_PROTECTED if and only if
there is a memslot, and creating a memslot is supposed to invalidate the MMIO
cache by virtue of changing the memslot generation.
Unless someone feels strongly that the mmio_info_in_cache() call should be
deleted entirely, I'll tack on this patch. The cache lookup is cheap, and IMO
it's helpful to explicitly document that it should be impossible to reach this
point with what appears to be MMIO.
---
arch/x86/kvm/mmu/mmu.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 50695eb2ee22..7f3f57237f23 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5997,6 +5997,18 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
vcpu->arch.last_retry_eip = 0;
vcpu->arch.last_retry_addr = 0;
+ /*
+ * It should be impossible to reach this point with an MMIO cache hit,
+ * as RET_PF_WRITE_PROTECTED is returned if and only if there's a valid,
+ * writable memslot, and creating a memslot should invalidate the MMIO
+ * cache by way of changing the memslot generation. WARN and disallow
+ * retry if MMIO is detect, as retrying MMIO emulation is pointless and
+ * could put the vCPU into an infinite loop because the processor will
+ * keep faulting on the non-existent MMIO address.
+ */
+ if (WARN_ON_ONCE(mmio_info_in_cache(vcpu, cr2_or_gpa, direct)))
+ return RET_PF_EMULATE;
+
/*
* Before emulating the instruction, check to see if the access may be
* due to L1 accessing nested NPT/EPT entries used for L2, i.e. if the
@@ -6029,17 +6041,15 @@ static int kvm_mmu_write_protect_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
return RET_PF_FIXED;
/*
- * The gfn is write-protected, but if emulation fails we can still
- * optimistically try to just unprotect the page and let the processor
+ * The gfn is write-protected, but if KVM detects its emulating an
+ * instruction that is unlikely to be used to modify page tables, or if
+ * emulation fails, KVM can try to unprotect the gfn and let the CPU
* re-execute the instruction that caused the page fault. Do not allow
- * retrying MMIO emulation, as it's not only pointless but could also
- * cause us to enter an infinite loop because the processor will keep
- * faulting on the non-existent MMIO address. Retrying an instruction
- * from a nested guest is also pointless and dangerous as we are only
- * explicitly shadowing L1's page tables, i.e. unprotecting something
- * for L1 isn't going to magically fix whatever issue cause L2 to fail.
+ * retrying an instruction from a nested guest as KVM is only explicitly
+ * shadowing L1's page tables, i.e. unprotecting something for L1 isn't
+ * going to magically fix whatever issue cause L2 to fail.
*/
- if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
+ if (!is_guest_mode(vcpu))
*emulation_type |= EMULTYPE_ALLOW_RETRY_PF;
return RET_PF_EMULATE;
base-commit: 7d33880356496eb0640c6c825cc60898063c4902
--
Powered by blists - more mailing lists