[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220805073544.555261-1-leobras@redhat.com>
Date: Fri, 5 Aug 2022 04:35:45 -0300
From: Leonardo Bras <leobras@...hat.com>
To: Paolo Bonzini <pbonzini@...hat.com>, Peter Xu <peterx@...hat.com>
Cc: Leonardo Bras <leobras@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [RFC PATCH 1/1] kvm: Use a new spinlock to avoid atomic operations in kvm_get_dirty_log_protect
<The objective is to get feedback on the idea, before trying to get a
proper version. Please provide feedback! >
kvm_get_dirty_log_protect is very slow, and it takes a while to get the
dirty bitmap when 'manual_dirty_log_protect' is not being used.
Current working is something like:
- Memset the 'second_dirty_bitmap' with zeros
- Get KVM_MMU_LOCK to scan the whole bitmap for dirty-bits
- If there is long with a dirty-bit, xchg() it into second_dirty_bitmap
- Reprotect the related memory pages,
- Flush remote tlbs,
- Copy the dirty-bitmap to the user, and return.
The proposal is to remove the memset and the atomics part by using the
second_dirty_bitmap as a real dirty-bitmap, instead of a simple buffer:
- When kvm_get_dirty_log_protect runs, before copying anything, switches
to the other dirty-bitmap (primary->secondary or secondary->primary).
- (After that, mark_page_dirty_in_slot() will start writing dirty-bits to
the other bitmap, so there will be no one using it, and it's fine to
modify it without atomics),
- Then, copy the current bitmap to userspace,
- Then get the KVM_MMU_LOCK, scan the bitmap copied, and both protect the
page and clean the dirty bits in the same loop.
- Flush remote tlbs, and return.
The bitmap switch (primary <-> secondary) is protected by a spinlock,
which is also added into mark_page_dirty_in_slot() to make sure it does
not write to the wrong bitmap. Since this spinlock protects just a few
instructios, it's should not introduce a lot of delay.
Results:
1 - Average clock count spent in kvm_get_dirty_log_protect() goes from
13608798 to 6734244, representing around 50% less clock cycles.
2 - Average clock count spent in mark_page_dirty_in_slot() goes from
462 to 471, representing around 2% increase, but since this means
just a few cycles, it can be an imprecise measure.
Known limitations:
0 - It's preliminary. It has been tested but there are a lot of bad stuff
that needs to be fixed, if the idea is interesting.
1 - Spin-Locking in mark_page_dirty_in_slot():
I understand this function happens a lot in the guest and should probably
be as fast as possible, so introducing a lock here seems
counter-productive, but to be fair, I could not see it any slower than a
couple cycles in my current setup (x86_64 machine).
2 - Qemu will use the 'manual_dirty_log_protect'
I understand that more recent versions qemu will use
'manual_dirty_log_protect' when available, so this approach will not
benefit this use case, which is quite common.
A counter argument would be: there are other hypervisors that could benefit
from it, and that is also applicable for older qemu versions.
3 - On top of that, the overhead in (1) will be unnecessary for (2) case
That could be removed by testing for 'manual_dirty_log_protect' in
'mark_page_dirty_in_slot()', and skipping the spinlock, but this was not
added just to keep the rfc simpler.
Performance tests were doing using:
- x86_64 VM with 32 vcpus and 16GB RAM
- Systemtap probes in function enter and function return, measuring the
number of clock cycles spent between those two, and printing the average.
- Synthetic VM load, that keeps dirtying memory at fixed rate of 8Gbps.
- VM Migration between hosts.
Further info:
- I am also trying to think on improvements for the
'manual_dirty_log_protect' use case, which seems to be very hard to
improve. For starters, using the same approach to remove the atomics
does not seem to cause any relevant speedup.
Signed-off-by: Leonardo Bras <leobras@...hat.com>
---
include/linux/kvm_host.h | 3 ++
virt/kvm/kvm_main.c | 75 ++++++++++++++++++++++++++--------------
2 files changed, 52 insertions(+), 26 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 90a45ef7203bd..bee3809e85e93 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -571,6 +571,9 @@ struct kvm_memory_slot {
gfn_t base_gfn;
unsigned long npages;
unsigned long *dirty_bitmap;
+ /* Protects use_secondary, which tells which bitmap to use */
+ spinlock_t bm_switch_lock;
+ bool use_secondary;
struct kvm_arch_memory_slot arch;
unsigned long userspace_addr;
u32 flags;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a49df8988cd6a..5ea543d8cfec2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1560,6 +1560,9 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
r = kvm_arch_prepare_memory_region(kvm, old, new, change);
+ if (new)
+ spin_lock_init(&new->bm_switch_lock);
+
/* Free the bitmap on failure if it was allocated above. */
if (r && new && new->dirty_bitmap && (!old || !old->dirty_bitmap))
kvm_destroy_dirty_bitmap(new);
@@ -2053,7 +2056,6 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
int i, as_id, id;
unsigned long n;
unsigned long *dirty_bitmap;
- unsigned long *dirty_bitmap_buffer;
bool flush;
/* Dirty ring tracking is exclusive to dirty log tracking */
@@ -2070,12 +2072,11 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
if (!memslot || !memslot->dirty_bitmap)
return -ENOENT;
- dirty_bitmap = memslot->dirty_bitmap;
-
kvm_arch_sync_dirty_log(kvm, memslot);
n = kvm_dirty_bitmap_bytes(memslot);
flush = false;
+
if (kvm->manual_dirty_log_protect) {
/*
* Unlike kvm_get_dirty_log, we always return false in *flush,
@@ -2085,35 +2086,49 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
* transition to kvm_get_dirty_log_protect and kvm_get_dirty_log
* can be eliminated.
*/
- dirty_bitmap_buffer = dirty_bitmap;
- } else {
- dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
- memset(dirty_bitmap_buffer, 0, n);
+ if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
+ return -EFAULT;
- KVM_MMU_LOCK(kvm);
- for (i = 0; i < n / sizeof(long); i++) {
- unsigned long mask;
- gfn_t offset;
+ return 0;
+ }
- if (!dirty_bitmap[i])
- continue;
+ /*
+ * Switches between primary and secondary dirty_bitmap.
+ * After unlocking, all new dirty bits are written to an empty bitmap.
+ */
+ spin_lock(&memslot->bm_switch_lock);
+ if (memslot->use_secondary)
+ dirty_bitmap = kvm_second_dirty_bitmap(memslot);
+ else
+ dirty_bitmap = memslot->dirty_bitmap;
- flush = true;
- mask = xchg(&dirty_bitmap[i], 0);
- dirty_bitmap_buffer[i] = mask;
+ memslot->use_secondary = !memslot->use_secondary;
+ spin_unlock(&memslot->bm_switch_lock);
- offset = i * BITS_PER_LONG;
- kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
- offset, mask);
- }
- KVM_MMU_UNLOCK(kvm);
+ if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
+ return -EFAULT;
+
+ KVM_MMU_LOCK(kvm);
+ for (i = 0; i < n / sizeof(long); i++) {
+ unsigned long mask;
+ gfn_t offset;
+
+ if (!dirty_bitmap[i])
+ continue;
+
+ flush = true;
+ mask = dirty_bitmap[i];
+ dirty_bitmap[i] = 0;
+
+ offset = i * BITS_PER_LONG;
+ kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
+ offset, mask);
}
+ KVM_MMU_UNLOCK(kvm);
if (flush)
kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
- if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
- return -EFAULT;
return 0;
}
@@ -3203,11 +3218,19 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
unsigned long rel_gfn = gfn - memslot->base_gfn;
u32 slot = (memslot->as_id << 16) | memslot->id;
- if (kvm->dirty_ring_size)
+ if (kvm->dirty_ring_size) {
kvm_dirty_ring_push(&vcpu->dirty_ring,
slot, rel_gfn);
- else
- set_bit_le(rel_gfn, memslot->dirty_bitmap);
+ } else {
+ spin_lock(&memslot->bm_switch_lock);
+ if (memslot->use_secondary)
+ set_bit_le(rel_gfn,
+ kvm_second_dirty_bitmap(memslot));
+ else
+ set_bit_le(rel_gfn, memslot->dirty_bitmap);
+
+ spin_unlock(&memslot->bm_switch_lock);
+ }
}
}
EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);
--
2.37.1
Powered by blists - more mailing lists