[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8b3f68dd-c61c-16a0-2077-0a5d3d57a357@redhat.com>
Date: Fri, 6 Nov 2020 12:28:17 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Peter Xu <peterx@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
Andrew Jones <drjones@...hat.com>,
Sean Christopherson <sean.j.christopherson@...el.com>
Subject: Re: [PATCH v13 00/14] KVM: Dirty ring interface
On 01/10/20 03:20, Peter Xu wrote:
> KVM branch:
> https://github.com/xzpeter/linux/tree/kvm-dirty-ring
>
> QEMU branch for testing:
> https://github.com/xzpeter/qemu/tree/kvm-dirty-ring
>
> v13:
> - rebase to kvm/queue rather than 5.9-rc7. I think, kvm/queue is broken. I
> can only test the dirty ring after I revert 3eb900173c71 ("KVM: x86: VMX:
> Prevent MSR passthrough when MSR access is denied", 2020-09-28), otherwise
> the guest will hang on vcpu0 looping forever during boot Linux.
> - added another trivial patch "KVM: Documentation: Update entry for
> KVM_X86_SET_MSR_FILTER". It should be squashed into 1a155254ff93 ("KVM: x86:
> Introduce MSR filtering", 2020-09-28) directly.
>
> v12:
> - add r-bs for Sean
> - rebase
>
> v11:
> - rebased to kvm/queue (seems the newest)
> - removed kvm_dirty_ring_waitqueue() tracepoint since not used
> - set memslot->as_id in kvm_delete_memslot() [Sean]
> - let __copy_to_user() always return -EFAULT [Sean]
> - rename 'r' in alloc_apic_access_page into 'hva' [Sean]
>
> v10:
> - remove unused identity_map_pfn in init_rmode_identity_map [syzbot]
> - add "static" to kvm_dirty_ring_full [syzbot]
> - kvm_page_in_dirty_ring() use "#if" macros for KVM_DIRTY_LOG_PAGE_OFFSET to
> quiesce syzbot [syzbot]
> - s/false/null/ in gfn_to_memslot_dirty_bitmap() [syzbot]
>
> v9:
> - patch 3: __x86_set_memory_region: squash another trivial change to return
> (0xdeadull << 48) always for slot removal [Sean]
> - pick r-bs for Drew
>
> For previous versions, please refer to:
>
> V1: https://lore.kernel.org/kvm/20191129213505.18472-1-peterx@redhat.com
> V2: https://lore.kernel.org/kvm/20191221014938.58831-1-peterx@redhat.com
> V3: https://lore.kernel.org/kvm/20200109145729.32898-1-peterx@redhat.com
> V4: https://lore.kernel.org/kvm/20200205025105.367213-1-peterx@redhat.com
> V5: https://lore.kernel.org/kvm/20200304174947.69595-1-peterx@redhat.com
> V6: https://lore.kernel.org/kvm/20200309214424.330363-1-peterx@redhat.com
> V7: https://lore.kernel.org/kvm/20200318163720.93929-1-peterx@redhat.com
> V8: https://lore.kernel.org/kvm/20200331190000.659614-1-peterx@redhat.com
> V9: https://lore.kernel.org/kvm/20200523225659.1027044-1-peterx@redhat.com
> V10: https://lore.kernel.org/kvm/20200601115957.1581250-1-peterx@redhat.com/
>
> Overview
> ============
>
> This is a continued work from Lei Cao <lei.cao@...atus.com> and Paolo
> Bonzini on the KVM dirty ring interface.
>
> The new dirty ring interface is another way to collect dirty pages for
> the virtual machines. It is different from the existing dirty logging
> interface in a few ways, majorly:
>
> - Data format: The dirty data was in a ring format rather than a
> bitmap format, so dirty bits to sync for dirty logging does not
> depend on the size of guest memory any more, but speed of
> dirtying. Also, the dirty ring is per-vcpu, while the dirty
> bitmap is per-vm.
>
> - Data copy: The sync of dirty pages does not need data copy any more,
> but instead the ring is shared between the userspace and kernel by
> page sharings (mmap() on vcpu fd)
>
> - Interface: Instead of using the old KVM_GET_DIRTY_LOG,
> KVM_CLEAR_DIRTY_LOG interfaces, the new ring uses the new
> KVM_RESET_DIRTY_RINGS ioctl when we want to reset the collected
> dirty pages to protected mode again (works like
> KVM_CLEAR_DIRTY_LOG, but ring based). To collecting dirty bits,
> we only need to read the ring data, no ioctl is needed.
>
> Ring Layout
> ===========
>
> KVM dirty ring is per-vcpu. Each ring is an array of kvm_dirty_gfn
> defined as:
>
> struct kvm_dirty_gfn {
> __u32 flags;
> __u32 slot; /* as_id | slot_id */
> __u64 offset;
> };
>
> Each GFN is a state machine itself. The state is embeded in the flags
> field, as defined in the uapi header:
>
> /*
> * KVM dirty GFN flags, defined as:
> *
> * |---------------+---------------+--------------|
> * | bit 1 (reset) | bit 0 (dirty) | Status |
> * |---------------+---------------+--------------|
> * | 0 | 0 | Invalid GFN |
> * | 0 | 1 | Dirty GFN |
> * | 1 | X | GFN to reset |
> * |---------------+---------------+--------------|
> *
> * Lifecycle of a dirty GFN goes like:
> *
> * dirtied collected reset
> * 00 -----------> 01 -------------> 1X -------+
> * ^ |
> * | |
> * +------------------------------------------+
> *
> * The userspace program is only responsible for the 01->1X state
> * conversion (to collect dirty bits). Also, it must not skip any
> * dirty bits so that dirty bits are always collected in sequence.
> */
>
> Testing
> =======
>
> This series provided both the implementation of the KVM dirty ring and
> the test case. Also I've implemented the QEMU counterpart that can
> run with the new KVM, link can be found at the top of the cover
> letter. However that's still a very initial version which is prone to
> change and future optimizations.
>
> I did some measurement with the new method with 24G guest running some
> dirty workload, I don't see any speedup so far, even in some heavy
> dirty load it'll be slower (e.g., when 800MB/s random dirty rate, kvm
> dirty ring takes average of ~73s to complete migration while dirty
> logging only needs average of ~55s). However that's understandable
> because 24G guest means only 1M dirty bitmap, that's still a suitable
> case for dirty logging. Meanwhile heavier workload means worst case
> for dirty ring.
>
> More tests are welcomed if there's bigger host/guest, especially on
> COLO-like workload.
>
> Please review, thanks.
>
> Peter Xu (14):
> KVM: Documentation: Update entry for KVM_X86_SET_MSR_FILTER
> KVM: Cache as_id in kvm_memory_slot
> KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR]
> KVM: Pass in kvm pointer into mark_page_dirty_in_slot()
> KVM: X86: Implement ring-based dirty memory tracking
> KVM: Make dirty ring exclusive to dirty bitmap log
> KVM: Don't allocate dirty bitmap if dirty ring is enabled
> KVM: selftests: Always clear dirty bitmap after iteration
> KVM: selftests: Sync uapi/linux/kvm.h to tools/
> KVM: selftests: Use a single binary for dirty/clear log test
> KVM: selftests: Introduce after_vcpu_run hook for dirty log test
> KVM: selftests: Add dirty ring buffer test
> KVM: selftests: Let dirty_log_test async for dirty ring test
> KVM: selftests: Add "-c" parameter to dirty log test
>
> Documentation/virt/kvm/api.rst | 126 ++++-
> arch/x86/include/asm/kvm_host.h | 6 +-
> arch/x86/include/uapi/asm/kvm.h | 1 +
> arch/x86/kvm/Makefile | 3 +-
> arch/x86/kvm/mmu/mmu.c | 10 +-
> arch/x86/kvm/svm/avic.c | 9 +-
> arch/x86/kvm/vmx/vmx.c | 96 ++--
> arch/x86/kvm/x86.c | 46 +-
> include/linux/kvm_dirty_ring.h | 103 ++++
> include/linux/kvm_host.h | 19 +
> include/trace/events/kvm.h | 63 +++
> include/uapi/linux/kvm.h | 53 ++
> tools/include/uapi/linux/kvm.h | 77 ++-
> tools/testing/selftests/kvm/Makefile | 2 -
> .../selftests/kvm/clear_dirty_log_test.c | 6 -
> tools/testing/selftests/kvm/dirty_log_test.c | 505 ++++++++++++++++--
> .../testing/selftests/kvm/include/kvm_util.h | 4 +
> tools/testing/selftests/kvm/lib/kvm_util.c | 72 ++-
> .../selftests/kvm/lib/kvm_util_internal.h | 4 +
> virt/kvm/dirty_ring.c | 197 +++++++
> virt/kvm/kvm_main.c | 168 +++++-
> 21 files changed, 1432 insertions(+), 138 deletions(-)
> create mode 100644 include/linux/kvm_dirty_ring.h
> delete mode 100644 tools/testing/selftests/kvm/clear_dirty_log_test.c
> create mode 100644 virt/kvm/dirty_ring.c
>
Queued, thanks! Now on to QEMU...
Paolo
Powered by blists - more mailing lists