linux-kernel - Re: [PATCH v2 4/7] KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ee757515-4a0f-c5cb-cd57-04983f62f499@redhat.com>
Date:   Wed, 2 Mar 2022 19:04:44 +0100
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Janosch Frank <frankja@...ux.ibm.com>
Cc:     David Hildenbrand <david@...hat.com>,
        Claudio Imbrenda <imbrenda@...ux.ibm.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Ben Gardon <bgardon@...gle.com>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [PATCH v2 4/7] KVM: x86/mmu: Zap only obsolete roots if a root
 shadow page is zapped

On 3/1/22 18:55, Paolo Bonzini wrote:
> On 2/25/22 19:22, Sean Christopherson wrote:
>> @@ -5656,7 +5707,7 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
>>        * Note: we need to do this under the protection of mmu_lock,
>>        * otherwise, vcpu would purge shadow page but miss tlb flush.
>>        */
>> -    kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
>> +    kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
> 
> I was going to squash in this:
> 
>        * invalidating TDP MMU roots must be done while holding mmu_lock for
> -     * write and in the same critical section as making the reload 
> request,
> +     * write and in the same critical section as making the free request,
>        * e.g. before kvm_zap_obsolete_pages() could drop mmu_lock and 
> yield.
> 
> But then I realized that this needs better comments and that my 
> knowledge of
> this has serious holes.  Regarding this comment, this is my proposal:
> 
>          /*
>           * Invalidated TDP MMU roots are zapped within MMU read_lock to be
>           * able to walk the list of roots, but with the expectation of no
>           * concurrent change to the pages themselves.  There cannot be
>           * any yield between kvm_tdp_mmu_invalidate_all_roots and the free
>           * request, otherwise somebody could grab a reference to the root
>       * and break that assumption.
>           */
>          if (is_tdp_mmu_enabled(kvm))
>                  kvm_tdp_mmu_invalidate_all_roots(kvm);
> 
> However, for the second comment (the one in the context above), there's 
> much
> more.  From easier to harder:
> 
> 1) I'm basically clueless about the TLB flush "note" above.
> 
> 2) It's not clear to me what needs to use for_each_tdp_mmu_root; for
> example, why would anything but the MMU notifiers use 
> for_each_tdp_mmu_root?
> It is used in kvm_tdp_mmu_write_protect_gfn, 
> kvm_tdp_mmu_try_split_huge_pages
> and kvm_tdp_mmu_clear_dirty_pt_masked.
> 
> 3) Does it make sense that yielding users of for_each_tdp_mmu_root must
> either look at valid roots only, or take MMU lock for write?  If so, can
> this be enforced in tdp_mmu_next_root?

Ok, I could understand this a little better now, but please correct me
if this is incorrect:

2) if I'm not wrong, kvm_tdp_mmu_try_split_huge_pages indeed does not
need to walk invalid  roots.  The others do because the TDP MMU does
not necessarily kick vCPUs after marking roots as invalid.  But
because TDP MMU roots are gone for good once their refcount hits 0,
I wonder if we could do something like

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7e3d1f985811..a4a6dfee27f9 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -164,6 +164,7 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
  	 */
  	if (!kvm_tdp_root_mark_invalid(root)) {
  		refcount_set(&root->tdp_mmu_root_count, 1);
+		kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
  
  		/*
  		 * If the struct kvm is alive, we might as well zap the root
@@ -1099,12 +1100,16 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
  void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
  {
  	struct kvm_mmu_page *root;
+	bool invalidated_root = false
  
  	lockdep_assert_held_write(&kvm->mmu_lock);
  	list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
  		if (!WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root)))
-			root->role.invalid = true;
+			invalidated_root |= !kvm_tdp_root_mark_invalid(root);
  	}
+
+	if (invalidated_root)
+		kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_FREE_OBSOLETE_ROOTS);
  }
  
  /*

(based on my own version of Sean's patches) and stop walking invalid roots
in kvm_tdp_mmu_write_protect_gfn and kvm_tdp_mmu_clear_dirty_pt_masked.


3) Yes, it makes sense that yielding users of for_each_tdp_mmu_root must
either look at valid roots only, or take MMU lock for write.  The only
exception is kvm_tdp_mmu_try_split_huge_pages, which does not need to
walk invalid roots.  And kvm_tdp_mmu_zap_invalidated_pages(), but that
one is basically an asynchronous worker [and this is where I had the
inspiration to get rid of the function altogether]

Paolo