[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z2Th31I/0O/HY/Mb@yzhao56-desk.sh.intel.com>
Date: Fri, 20 Dec 2024 11:17:51 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <leiyang@...hat.com>,
<rick.p.edgecombe@...el.com>
Subject: Re: [PATCH] KVM: x86/mmu: Treat TDP MMU faults as spurious if access
is already allowed
This also fixed an Accessed bit related bug in TDX, where
"KVM_BUG_ON(was_present, kvm)" can be hit in below scenario:
CPU 0 CPU 1
1. TD accepts GPA A
2. EPT violation in KVM
3. Prefault GPA A
4. Install an SPTE with
Accessed bit unset
5. Install an SPTE with
Accessed bit set
6. KVM_BUG_ON() in set_external_spte_present()
Reviewed-by: Yan Zhao <yan.y.zhao@...el.com>
On Wed, Dec 18, 2024 at 01:36:11PM -0800, Sean Christopherson wrote:
> Treat slow-path TDP MMU faults as spurious if the access is allowed given
> the existing SPTE to fix a benign warning (other than the WARN itself)
> due to replacing a writable SPTE with a read-only SPTE, and to avoid the
> unnecessary LOCK CMPXCHG and subsequent TLB flush.
>
> If a read fault races with a write fault, fast GUP fails for any reason
> when trying to "promote" the read fault to a writable mapping, and KVM
> resolves the write fault first, then KVM will end up trying to install a
> read-only SPTE (for a !map_writable fault) overtop a writable SPTE.
>
> Note, it's not entirely clear why fast GUP fails, or if that's even how
> KVM ends up with a !map_writable fault with a writable SPTE. If something
> else is going awry, e.g. due to a bug in mmu_notifiers, then treating read
> faults as spurious in this scenario could effectively mask the underlying
> problem.
>
> However, retrying the faulting access instead of overwriting an existing
> SPTE is functionally correct and desirable irrespective of the WARN, and
> fast GUP _can_ legitimately fail with a writable VMA, e.g. if the Accessed
> bit in primary MMU's PTE is toggled and causes a PTE value mismatch. The
> WARN was also recently added, specifically to track down scenarios where
> KVM is unnecessarily overwrites SPTEs, i.e. treating the fault as spurious
> doesn't regress KVM's bug-finding capabilities in any way. In short,
> letting the WARN linger because there's a tiny chance it's due to a bug
> elsewhere would be excessively paranoid.
>
> Fixes: 1a175082b190 ("KVM: x86/mmu: WARN and flush if resolving a TDP MMU fault clears MMU-writable")
> Reported-by: leiyang@...hat.com
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219588
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> arch/x86/kvm/mmu/mmu.c | 12 ------------
> arch/x86/kvm/mmu/spte.h | 17 +++++++++++++++++
> arch/x86/kvm/mmu/tdp_mmu.c | 5 +++++
> 3 files changed, 22 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 22e7ad235123..2401606db260 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3364,18 +3364,6 @@ static bool fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu,
> return true;
> }
>
> -static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte)
> -{
> - if (fault->exec)
> - return is_executable_pte(spte);
> -
> - if (fault->write)
> - return is_writable_pte(spte);
> -
> - /* Fault was on Read access */
> - return spte & PT_PRESENT_MASK;
> -}
> -
> /*
> * Returns the last level spte pointer of the shadow page walk for the given
> * gpa, and sets *spte to the spte value. This spte may be non-preset. If no
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index f332b33bc817..af10bc0380a3 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -461,6 +461,23 @@ static inline bool is_mmu_writable_spte(u64 spte)
> return spte & shadow_mmu_writable_mask;
> }
>
> +/*
> + * Returns true if the access indicated by @fault is allowed by the existing
> + * SPTE protections. Note, the caller is responsible for checking that the
> + * SPTE is a shadow-present, leaf SPTE (either before or after).
> + */
> +static inline bool is_access_allowed(struct kvm_page_fault *fault, u64 spte)
> +{
> + if (fault->exec)
> + return is_executable_pte(spte);
> +
> + if (fault->write)
> + return is_writable_pte(spte);
> +
> + /* Fault was on Read access */
> + return spte & PT_PRESENT_MASK;
> +}
> +
> /*
> * If the MMU-writable flag is cleared, i.e. the SPTE is write-protected for
> * write-tracking, remote TLBs must be flushed, even if the SPTE was read-only,
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 4508d868f1cd..2f15e0e33903 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -985,6 +985,11 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
> if (fault->prefetch && is_shadow_present_pte(iter->old_spte))
> return RET_PF_SPURIOUS;
>
> + if (is_shadow_present_pte(iter->old_spte) &&
> + is_access_allowed(fault, iter->old_spte) &&
> + is_last_spte(iter->old_spte, iter->level))
One nit:
Do we need to warn on pfn_changed?
> + return RET_PF_SPURIOUS;
> +
> if (unlikely(!fault->slot))
> new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
> else
>
> base-commit: 3522c419758ee8dca5a0e8753ee0070a22157bc1
> --
> 2.47.1.613.gc27f4b7a9f-goog
>
>
Powered by blists - more mailing lists