[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210204221959.232582-1-bgardon@google.com>
Date: Thu, 4 Feb 2021 14:19:59 -0800
From: Ben Gardon <bgardon@...gle.com>
To: linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: Paolo Bonzini <pbonzini@...hat.com>, Peter Xu <peterx@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Peter Shier <pshier@...gle.com>,
Junaid Shahid <junaids@...gle.com>,
Jim Mattson <jmattson@...gle.com>,
Makarand Sonare <makarandsonare@...gle.com>,
Kai Huang <kai.huang@...el.com>,
Ben Gardon <bgardon@...gle.com>
Subject: [PATCH] KVM: VMX: Optimize flushing the PML buffer
vmx_flush_pml_buffer repeatedly calls kvm_vcpu_mark_page_dirty, which
SRCU-derefrences kvm->memslots. In order to give the compiler more
freedom to optimize the function, SRCU-dereference the pointer
kvm->memslots only once.
Reviewed-by: Makarand Sonare <makarandsonare@...gle.com>
Signed-off-by: Ben Gardon <bgardon@...gle.com>
---
Tested by running the dirty_log_perf_test selftest on a dual socket Intel
Skylake machine:
./dirty_log_perf_test -v 4 -b 30G -i 5
The test was run 5 times with and without this patch and the dirty
memory time for iterations 2-5 was averaged across the 5 runs.
Iteration 1 was discarded for this analysis because it is still dominated
by the time spent populating memory.
The average time for each run demonstrated a strange bimodal distribution,
with clusters around 2 seconds and 2.5 seconds. This may have been a
result of vCPU migration between NUMA nodes.
In any case, the get dirty times with this patch averaged to 2.07
seconds, a 7% savings from the 2.22 second everage without this patch.
While these savings may be partly a result of the patched runs having
one more 2 second clustered run, the patched runs in the higer cluster
were also 7-8% shorter than those in the unpatched case.
Below is the raw data for anyone interested in visualizing the results
with a graph:
Iteration Baseline Patched
2 2.038562907 2.045226614
3 2.037363248 2.045033709
4 2.037176331 1.999783966
5 1.999891981 2.007849104
2 2.569526298 2.001252504
3 2.579110209 2.008541897
4 2.585883731 2.005317983
5 2.588692727 2.007100987
2 2.01191437 2.006953735
3 2.012972236 2.04540153
4 1.968836017 2.005035246
5 1.967915154 2.003859551
2 2.037533296 1.991275846
3 2.501480125 2.391886691
4 2.454382587 2.391904789
5 2.461046772 2.398767963
2 2.036991484 2.011331436
3 2.002954418 2.002635687
4 2.053342717 2.006769959
5 2.522539759 2.006470059
Average 2.223405818 2.069119963
arch/x86/kvm/vmx/vmx.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cc60b1fc3ee7..46c54802dfdb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5692,6 +5692,7 @@ static void vmx_destroy_pml_buffer(struct vcpu_vmx *vmx)
static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ struct kvm_memslots *memslots;
u64 *pml_buf;
u16 pml_idx;
@@ -5707,13 +5708,18 @@ static void vmx_flush_pml_buffer(struct kvm_vcpu *vcpu)
else
pml_idx++;
+ memslots = kvm_vcpu_memslots(vcpu);
+
pml_buf = page_address(vmx->pml_pg);
for (; pml_idx < PML_ENTITY_NUM; pml_idx++) {
+ struct kvm_memory_slot *memslot;
u64 gpa;
gpa = pml_buf[pml_idx];
WARN_ON(gpa & (PAGE_SIZE - 1));
- kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
+
+ memslot = __gfn_to_memslot(memslots, gpa >> PAGE_SHIFT);
+ mark_page_dirty_in_slot(vcpu->kvm, memslot, gpa >> PAGE_SHIFT);
}
/* reset PML index */
--
2.30.0.365.g02bc693789-goog
Powered by blists - more mailing lists