[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e3769513-d2d8-e3fb-7887-3c8872b0f00c@maciej.szmigiero.name>
Date: Fri, 21 May 2021 09:03:23 +0200
From: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Igor Mammedov <imammedo@...hat.com>,
Marc Zyngier <maz@...nel.org>,
James Morse <james.morse@....com>,
Julien Thierry <julien.thierry.kdev@...il.com>,
Suzuki K Poulose <suzuki.poulose@....com>,
Huacai Chen <chenhuacai@...nel.org>,
Aleksandar Markovic <aleksandar.qemu.devel@...il.com>,
Paul Mackerras <paulus@...abs.org>,
Christian Borntraeger <borntraeger@...ibm.com>,
Janosch Frank <frankja@...ux.ibm.com>,
David Hildenbrand <david@...hat.com>,
Cornelia Huck <cohuck@...hat.com>,
Claudio Imbrenda <imbrenda@...ux.ibm.com>,
Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/8] KVM: x86: Cache total page count to avoid
traversing the memslot array
On 19.05.2021 23:00, Sean Christopherson wrote:
> On Sun, May 16, 2021, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@...cle.com>
>>
>> There is no point in recalculating from scratch the total number of pages
>> in all memslots each time a memslot is created or deleted.
>>
>> Just cache the value and update it accordingly on each such operation so
>> the code doesn't need to traverse the whole memslot array each time.
>>
>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@...cle.com>
>> ---
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 5bd550eaf683..8c7738b75393 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -11112,9 +11112,21 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>> const struct kvm_memory_slot *new,
>> enum kvm_mr_change change)
>> {
>> - if (!kvm->arch.n_requested_mmu_pages)
>> - kvm_mmu_change_mmu_pages(kvm,
>> - kvm_mmu_calculate_default_mmu_pages(kvm));
>> + if (change == KVM_MR_CREATE)
>> + kvm->arch.n_memslots_pages += new->npages;
>> + else if (change == KVM_MR_DELETE) {
>> + WARN_ON(kvm->arch.n_memslots_pages < old->npages);
>
> Heh, so I think this WARN can be triggered at will by userspace on 32-bit KVM by
> causing the running count to wrap. KVM artificially caps the size of a single
> memslot at ((1UL << 31) - 1), but userspace could create multiple gigantic slots
> to overflow arch.n_memslots_pages.
>
> I _think_ changing it to a u64 would fix the problem since KVM forbids overlapping
> memslots in the GPA space.
You are right, n_memslots_pages needs to be u64 so it does not overflow
on 32-bit KVM.
The memslot count is limited to 32k in each of 2 address spaces, so in
the worst case the variable should hold 15-bits + 1 bit + 31-bits = 47 bit number.
> Also, what about moving the check-and-WARN to prepare_memory_region() so that
> KVM can error out if the check fails? Doesn't really matter, but an explicit
> error for userspace is preferable to underflowing the number of pages and getting
> weird MMU errors/behavior down the line.
In principle this seems like a possibility, however, it is a more
regression-risky option, in case something has (perhaps unintentionally)
relied on the fact that kvm_mmu_zap_oldest_mmu_pages() call from
kvm_mmu_change_mmu_pages() was being done only in the memslot commit
function.
>> + kvm->arch.n_memslots_pages -= old->npages;
>> + }
>> +
>> + if (!kvm->arch.n_requested_mmu_pages) {
>
> If we're going to bother caching the number of pages then we should also skip
> the update when the number pages isn't changing, e.g.
>
> if (change == KVM_MR_CREATE || change == KVM_MR_DELETE) {
> if (change == KVM_MR_CREATE)
> kvm->arch.n_memslots_pages += new->npages;
> else
> kvm->arch.n_memslots_pages -= old->npages;
>
> if (!kvm->arch.n_requested_mmu_pages) {
> unsigned long nr_mmu_pages;
>
> nr_mmu_pages = kvm->arch.n_memslots_pages *
> KVM_PERMILLE_MMU_PAGES / 1000;
> nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES);
> kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
> }
> }
The old code did it that way (unconditionally) and, as in the case above,
I didn't want to risk an regression.
If we are going to change this fact then I think it should happen in a
separate patch.
Thanks,
Maciej
Powered by blists - more mailing lists