linux-kernel - Re: [PATCH 2/2] KVM: x86: zap invalid roots in kvm_tdp_mmu_zap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Ybd2cEqUnxiy/JBd@google.com>
Date:   Mon, 13 Dec 2021 16:36:00 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        ignat@...udflare.com, bgardon@...gle.com, dmatlack@...gle.com,
        stevensd@...omium.org, kernel-team@...udflare.com,
        stable@...r.kernel.org
Subject: Re: [PATCH 2/2] KVM: x86: zap invalid roots in kvm_tdp_mmu_zap_all

On Mon, Dec 13, 2021, Paolo Bonzini wrote:
> kvm_tdp_mmu_zap_all is intended to visit all roots and zap their page
> tables, which flushes the accessed and dirty bits out to the Linux
> "struct page"s.  Missing some of the roots has catastrophic effects,
> because kvm_tdp_mmu_zap_all is called when the MMU notifier is being
> removed and any PTEs left behind might become dangling by the time
> kvm-arch_destroy_vm tears down the roots for good.
> 
> Unfortunately that is exactly what kvm_tdp_mmu_zap_all is doing: it
> visits all roots via for_each_tdp_mmu_root_yield_safe, which in turn
> uses kvm_tdp_mmu_get_root to skip invalid roots.  If the current root is
> invalid at the time of kvm_tdp_mmu_zap_all, its page tables will remain
> in place but will later be zapped during kvm_arch_destroy_vm.

As stated in the bug report thread[*], it should be impossible as for the MMU
notifier to be unregistered while kvm_mmu_zap_all_fast() is running.

I do believe there's a race between set_nx_huge_pages() and kvm_mmu_notifier_release(),
but that would result in the use-after-free kvm_set_pfn_dirty() tracing back to
set_nx_huge_pages(), not kvm_destroy_vm().  And for that, I would much prefer we
elevant mm->users while changing the NX hugepage setting.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8f0035517450..985df4db8192 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6092,10 +6092,15 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
                mutex_lock(&kvm_lock);

                list_for_each_entry(kvm, &vm_list, vm_list) {
+                       if (!mmget_not_zero(kvm->mm))
+                               continue;
+
                        mutex_lock(&kvm->slots_lock);
                        kvm_mmu_zap_all_fast(kvm);
                        mutex_unlock(&kvm->slots_lock);

+                       mmput_async(kvm->mm);
+
                        wake_up_process(kvm->arch.nx_lpage_recovery_thread);
                }
                mutex_unlock(&kvm_lock);

[*] https://lore.kernel.org/all/Ybdxd7QcJI71UpHm@google.com/