[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5887de10-c615-175b-e491-86f94e542425@maciej.szmigiero.name>
Date: Sat, 22 May 2021 13:11:30 +0200
From: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Igor Mammedov <imammedo@...hat.com>,
Marc Zyngier <maz@...nel.org>,
James Morse <james.morse@....com>,
Julien Thierry <julien.thierry.kdev@...il.com>,
Suzuki K Poulose <suzuki.poulose@....com>,
Huacai Chen <chenhuacai@...nel.org>,
Aleksandar Markovic <aleksandar.qemu.devel@...il.com>,
Paul Mackerras <paulus@...abs.org>,
Christian Borntraeger <borntraeger@...ibm.com>,
Janosch Frank <frankja@...ux.ibm.com>,
David Hildenbrand <david@...hat.com>,
Cornelia Huck <cohuck@...hat.com>,
Claudio Imbrenda <imbrenda@...ux.ibm.com>,
Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 3/8] KVM: Resolve memslot ID via a hash table instead
of via a static array
On 21.05.2021 09:05, Maciej S. Szmigiero wrote:
> On 20.05.2021 00:31, Sean Christopherson wrote:
>> On Sun, May 16, 2021, Maciej S. Szmigiero wrote:
(..)
>>> new_size = old_size;
>>> slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
>>> - if (likely(slots))
>>> - memcpy(slots, old, old_size);
>>> + if (unlikely(!slots))
>>> + return NULL;
>>> +
>>> + memcpy(slots, old, old_size);
>>> +
>>> + hash_init(slots->id_hash);
>>> + kvm_for_each_memslot(memslot, slots)
>>> + hash_add(slots->id_hash, &memslot->id_node, memslot->id);
>>
>> What's the perf penalty if the number of memslots gets large? I ask because the
>> lazy rmap allocation is adding multiple calls to kvm_dup_memslots().
>
> I would expect the "move inactive" benchmark to be closest to measuring
> the performance of just a memslot array copy operation but the results
> suggest that the performance stays within ~10% window from 10 to 509
> memslots on the old code (it then climbs 13x for 32k case).
>
> That suggests that something else is dominating this benchmark for these
> memslot counts (probably zapping of shadow pages).
>
> At the same time, the tree-based memslots implementation is clearly
> faster in this benchmark, even for smaller memslot counts, so apparently
> copying of the memslot array has some performance impact, too.
>
> Measuring just kvm_dup_memslots() performance would probably be done
> best by benchmarking KVM_MR_FLAGS_ONLY operation - will try to add this
> operation to my set of benchmarks and see how it performs with different
> memslot counts.
Update:
I've implemented a simple KVM_MR_FLAGS_ONLY benchmark, that repeatably
sets and unsets KVM_MEM_LOG_DIRTY_PAGES flag on a memslot with a single
page of memory in it. [1]
Since on the current code with higher memslot counts the "set flags"
operation spends a significant time in kvm_mmu_calculate_default_mmu_pages()
a second set of measurements was done with patch [2] applied.
In this case, the top functions in the perf trace are "memcpy" and
"clear_page" (called from kvm_set_memslot(), most likely from inlined
kvm_dup_memslots()).
For reference, a set of measurements with the whole patch series
(patches 1 - 8) applied was also done, as "new code".
In this case, SRCU-related functions dominate the perf trace.
32k memslots:
Current code: 0.00130s
Current code + patch [2]: 0.00104s (13x 4k result)
New code: 0.0000144s
4k memslots:
Current code: 0.0000899s
Current code + patch [2]: 0.0000799s (+78% 2k result)
New code: 0.0000144s
2k memslots:
Current code: 0.0000495s
Current code + patch [2]: 0.0000447s (+54% 509 result)
New code: 0.0000143s
509 memslots:
Current code: 0.0000305s
Current code + patch [2]: 0.0000290s (+5% 100 result)
New code: 0.0000141s
100 memslots:
Current code: 0.0000280s
Current code + patch [2]: 0.0000275s (same as for 10 slots)
New code: 0.0000142s
10 memslots:
Current code: 0.0000272s
Current code + patch [2]: 0.0000272s
New code: 0.0000141s
Thanks,
Maciej
[1]: The patch against memslot_perf_test.c is available here:
https://github.com/maciejsszmigiero/linux/commit/841e94898a55ff79af9d20a08205aa80808bd2a8
[2]: "[PATCH v3 1/8] KVM: x86: Cache total page count to avoid traversing the memslot array"
Powered by blists - more mailing lists