linux-kernel - Re: [FYI PATCH] Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r16qnkgl.fsf@redhat.com>
Date:   Fri, 25 Mar 2022 16:03:06 +0100
From:   Vitaly Kuznetsov <vkuznets@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [FYI PATCH] Revert "KVM: x86/mmu: Zap only TDP MMU leafs in
 kvm_zap_gfn_range()"

Sean Christopherson <seanjc@...gle.com> writes:

...

So I went back to "KVM: x86/mmu: Zap only TDP MMU leafs in
kvm_zap_gfn_range()" and confirmed that with the patch in place Hyper-V
always crashes, sooner or later. With the patch reverted (as well as
with current 'kvm/queue') it boots.

>
> Actually, since this is apparently specific to kvm_zap_gfn_range(), can you add
> printk "tracing" in update_mtrr(), kvm_post_set_cr0(), and __kvm_request_apicv_update()
> to see what is actually triggering zaps?  Capturing the start and end GFNs would be very
> helpful for the MTRR case.
>
> The APICv update seems unlikely to affect only Hyper-V guests, though there is the auto
> EOI crud.  And the other two only come into play with non-coherent DMA.  In other words,
> figuring out exactly what sequence leads to failure should be straightforward.

The tricky part here is that Hyper-V doesn't crash immediately, the
crash is always different (if you look at the BSOD) and happens at
different times. Crashes mention various stuff like trying to execute
non-executable memory, ...

I've added tracing you've suggested:
- __kvm_request_apicv_update() happens only once in the very beginning.

- update_mtrr() never actually reaches kvm_zap_gfn_range()

- kvm_post_set_cr0() happen in early boot but the crash happen much much
  later. E.g.:
...
 qemu-system-x86-117525  [019] .....  4738.682954: kvm_post_set_cr0: vCPU 12 10 11
 qemu-system-x86-117525  [019] .....  4738.682997: kvm_post_set_cr0: vCPU 12 11 80000011
 qemu-system-x86-117525  [019] .....  4738.683053: kvm_post_set_cr0: vCPU 12 80000011 c0000011
 qemu-system-x86-117525  [019] .....  4738.683059: kvm_post_set_cr0: vCPU 12 c0000011 80010031
 qemu-system-x86-117526  [005] .....  4738.812107: kvm_post_set_cr0: vCPU 13 10 11
 qemu-system-x86-117526  [005] .....  4738.812148: kvm_post_set_cr0: vCPU 13 11 80000011
 qemu-system-x86-117526  [005] .....  4738.812198: kvm_post_set_cr0: vCPU 13 80000011 c0000011
 qemu-system-x86-117526  [005] .....  4738.812205: kvm_post_set_cr0: vCPU 13 c0000011 80010031
 qemu-system-x86-117527  [003] .....  4738.941004: kvm_post_set_cr0: vCPU 14 10 11
 qemu-system-x86-117527  [003] .....  4738.941107: kvm_post_set_cr0: vCPU 14 11 80000011
 qemu-system-x86-117527  [003] .....  4738.941218: kvm_post_set_cr0: vCPU 14 80000011 c0000011
 qemu-system-x86-117527  [003] .....  4738.941235: kvm_post_set_cr0: vCPU 14 c0000011 80010031
 qemu-system-x86-117528  [035] .....  4739.070338: kvm_post_set_cr0: vCPU 15 10 11
 qemu-system-x86-117528  [035] .....  4739.070428: kvm_post_set_cr0: vCPU 15 11 80000011
 qemu-system-x86-117528  [035] .....  4739.070539: kvm_post_set_cr0: vCPU 15 80000011 c0000011
 qemu-system-x86-117528  [035] .....  4739.070557: kvm_post_set_cr0: vCPU 15 c0000011 80010031
##### CPU 8 buffer started ####
 qemu-system-x86-117528  [008] .....  4760.099532: kvm_hv_set_msr_pw: 15

The debug patch for kvm_post_set_cr0() is:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4fa4d8269e5b..db7c5a05e574 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -870,6 +870,8 @@ EXPORT_SYMBOL_GPL(load_pdptrs);
 
 void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
 {
+       trace_printk("vCPU %d %lx %lx\n", vcpu->vcpu_id, old_cr0, cr0);
+
        if ((cr0 ^ old_cr0) & X86_CR0_PG) {
                kvm_clear_async_pf_completion_queue(vcpu);
                kvm_async_pf_hash_reset(vcpu);

kvm_hv_set_msr_pw() call is when Hyper-V writes to HV_X64_MSR_CRASH_CTL
('hv-crash' QEMU flag is needed to enable the feature). The debug patch
is:

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index a32f54ab84a2..59a72f6ced99 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1391,6 +1391,7 @@ static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
 
                        /* Send notification about crash to user space */
                        kvm_make_request(KVM_REQ_HV_CRASH, vcpu);
+                       trace_printk("%d\n", vcpu->vcpu_id);
                }
                break;
        case HV_X64_MSR_RESET:

So it's 20 seconds (!) between the last kvm_post_set_cr0() call and the
crash. My (disappointing) conclusion is: the problem can be anywhere and
Hyper-V detects it much much later.

-- 
Vitaly