[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240805075911.3cxfzewmqlkmvgfw@yy-desk-7060>
Date: Mon, 5 Aug 2024 15:59:11 +0800
From: Yuan Yao <yuan.yao@...ux.intel.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 9/9] KVM: x86/mmu: Track SPTE accessed info across
mmu_notifier PROT changes
On Thu, Aug 01, 2024 at 11:34:53AM -0700, Sean Christopherson wrote:
> Preserve Accessed information when zapping SPTEs in response to an
> mmu_notifier protection change, e.g. if KVM is zapping SPTEs because
> NUMA balancing kicked in. KVM is not required to fully unmap the SPTE,
> and the core VMA information isn't changing, i.e. the information is
> still fresh and useful.
>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 31 +++++++++++++++++++++++++------
> 1 file changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index ac3200ce00f9..780f35a22c05 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -838,7 +838,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> * operation can cause a soft lockup.
> */
> static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
> - gfn_t start, gfn_t end, bool can_yield, bool flush)
> + gfn_t start, gfn_t end, bool can_yield,
> + bool keep_accessed_bit, bool flush)
> {
> struct tdp_iter iter;
>
> @@ -849,17 +850,29 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
> rcu_read_lock();
>
> for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
> + u64 new_spte = SHADOW_NONPRESENT_VALUE;
> +
> if (can_yield &&
> tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
> flush = false;
> continue;
> }
>
> + /*
> + * Note, this will fail to clear non-present, accessed SPTEs,
> + * but that isn't a functional problem, it can only result in
> + * a _potential_ false positive in the unlikely scenario that
> + * the primary MMU zaps an hva, reinstalls a new hva, and ages
> + * the new hva, all before KVM accesses the hva.
> + */
> if (!is_shadow_present_pte(iter.old_spte) ||
> !is_last_spte(iter.old_spte, iter.level))
> continue;
>
> - tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
> + if (keep_accessed_bit)
> + new_spte |= iter.old_spte & shadow_accessed_mask;
> +
> + tdp_mmu_iter_set_spte(kvm, &iter, new_spte);
>
> /*
> * Zappings SPTEs in invalid roots doesn't require a TLB flush,
> @@ -889,7 +902,7 @@ bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, gfn_t start, gfn_t end, bool flush)
>
> lockdep_assert_held_write(&kvm->mmu_lock);
> for_each_valid_tdp_mmu_root_yield_safe(kvm, root, -1)
> - flush = tdp_mmu_zap_leafs(kvm, root, start, end, true, flush);
> + flush = tdp_mmu_zap_leafs(kvm, root, start, end, true, false, flush);
>
> return flush;
> }
> @@ -1180,11 +1193,13 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range,
> bool flush)
> {
> + bool keep_a_bit = range->arg.event == MMU_NOTIFY_PROTECTION_VMA ||
> + range->arg.event == MMU_NOTIFY_PROTECTION_PAGE;
> struct kvm_mmu_page *root;
>
> __for_each_tdp_mmu_root_yield_safe(kvm, root, range->slot->as_id, false)
> flush = tdp_mmu_zap_leafs(kvm, root, range->start, range->end,
> - range->may_block, flush);
> + range->may_block, keep_a_bit, flush);
>
> return flush;
> }
> @@ -1201,7 +1216,11 @@ static void kvm_tdp_mmu_age_spte(struct tdp_iter *iter)
> {
> u64 new_spte;
>
> - if (spte_ad_enabled(iter->old_spte)) {
> + if (spte_ad_enabled(iter->old_spte) ||
> + !is_shadow_present_pte(iter->old_spte)) {
> + KVM_MMU_WARN_ON(!is_shadow_present_pte(iter->old_spte) &&
> + iter->old_spte != (SHADOW_NONPRESENT_VALUE | shadow_accessed_mask));
Is that possible some sptes are zapped by
kvm_tdp_mmu_zap_leafs(keep_accessed_bit = false) i.e. from kvm_post_set_cr0(),
then handled by __kvm_tdp_mmu_age_gfn_range() for aging before
accessed by guest again ?
In this scenario the spte is non-present w/o A bit set.
> +
> iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep,
> iter->old_spte,
> shadow_accessed_mask,
> @@ -1235,7 +1254,7 @@ static bool __kvm_tdp_mmu_age_gfn_range(struct kvm *kvm,
> for_each_valid_tdp_mmu_root(kvm, root, range->slot->as_id) {
> rcu_read_lock();
>
> - tdp_root_for_each_leaf_pte(iter, root, range->start, range->end) {
> + tdp_root_for_each_pte(iter, root, range->start, range->end) {
This also clears the A bit of non-leaf entries for aging, I remember
KVM doesn't care them before, could you please explain the reason of
this ?
> if (!is_accessed_spte(iter.old_spte))
> continue;
>
> --
> 2.46.0.rc1.232.g9752f9e123-goog
>
>
Powered by blists - more mailing lists