[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <338e88e6-bbed-f3bb-ead7-f15399f44285@redhat.com>
Date: Mon, 12 Jun 2023 09:11:16 +0200
From: David Hildenbrand <david@...hat.com>
To: Gavin Shan <gshan@...hat.com>, kvmarm@...ts.linux.dev
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
pbonzini@...hat.com, maz@...nel.org, seanjc@...gle.com,
oliver.upton@...ux.dev, aarcange@...hat.com, peterx@...hat.com,
hshuai@...hat.com, zhenyzha@...hat.com, shan.gavin@...il.com
Subject: Re: [PATCH v2] KVM: Avoid illegal stage2 mapping on invalid memory
slot
On 09.06.23 12:04, Gavin Shan wrote:
> We run into guest hang in edk2 firmware when KSM is kept as running on
> the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
> device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
> buffered write. The status is returned by reading the memory region of
> the pflash device and the read request should have been forwarded to QEMU
> and emulated by it. Unfortunately, the read request is covered by an
> illegal stage2 mapping when the guest hang issue occurs. The read request
> is completed with QEMU bypassed and wrong status is fetched. The edk2
> firmware runs into an infinite loop with the wrong status.
>
> The illegal stage2 mapping is populated due to same page sharing by KSM
> at (C) even the associated memory slot has been marked as invalid at (B)
> when the memory slot is requested to be deleted. It's notable that the
> active and inactive memory slots can't be swapped when we're in the middle
> of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
> is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
> to zero again. Besides, the swapping from the active to the inactive memory
> slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
> corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
>
> CPU-A CPU-B
> ----- -----
> ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
> kvm_vm_ioctl_set_memory_region
> kvm_set_memory_region
> __kvm_set_memory_region
> kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
> kvm_invalidate_memslot
> kvm_copy_memslot
> kvm_replace_memslot
> kvm_swap_active_memslots (A)
> kvm_arch_flush_shadow_memslot (B)
> same page sharing by KSM
> kvm_mmu_notifier_invalidate_range_start
> :
> kvm_mmu_notifier_change_pte
> kvm_handle_hva_range
> __kvm_handle_hva_range (C)
> :
> kvm_mmu_notifier_invalidate_range_end
>
> Fix the issue by skipping the invalid memory slot at (C) to avoid the
> illegal stage2 mapping so that the read request for the pflash's status
> is forwarded to QEMU and emulated by it. In this way, the correct pflash's
> status can be returned from QEMU to break the infinite loop in the edk2
> firmware.
>
> Cc: stable@...r.kernel.org # v5.13+
> Fixes: 3039bcc74498 ("KVM: Move x86's MMU notifier memslot walkers to generic code")
> Reported-by: Shuai Hu <hshuai@...hat.com>
> Reported-by: Zhenyu Zhang <zhenyzha@...hat.com>
> Signed-off-by: Gavin Shan <gshan@...hat.com>
> ---
> v2: Improved changelog suggested by Marc
> ---
> virt/kvm/kvm_main.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 479802a892d4..7f81a3a209b6 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -598,6 +598,9 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
> unsigned long hva_start, hva_end;
>
> slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]);
> + if (slot->flags & KVM_MEMSLOT_INVALID)
> + continue;
> +
> hva_start = max(range->start, slot->userspace_addr);
> hva_end = min(range->end, slot->userspace_addr +
> (slot->npages << PAGE_SHIFT));
Nice debugging!
LGTM
Reviewed-by: David Hildenbrand <david@...hat.com>
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists