linux-kernel - Re: [PATCH 2/2] KVM: Scalable memslots implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fdb049d9-fd06-78a3-d38b-849fc1c23e27@maciej.szmigiero.name>
Date:   Wed, 3 Feb 2021 21:10:38 +0100
From:   "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
To:     Vitaly Kuznetsov <vkuznets@...hat.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Igor Mammedov <imammedo@...hat.com>,
        Marc Zyngier <maz@...nel.org>,
        James Morse <james.morse@....com>,
        Julien Thierry <julien.thierry.kdev@...il.com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Huacai Chen <chenhuacai@...nel.org>,
        Aleksandar Markovic <aleksandar.qemu.devel@...il.com>,
        Paul Mackerras <paulus@...abs.org>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Janosch Frank <frankja@...ux.ibm.com>,
        David Hildenbrand <david@...hat.com>,
        Cornelia Huck <cohuck@...hat.com>,
        Claudio Imbrenda <imbrenda@...ux.ibm.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Sean Christopherson <seanjc@...gle.com>
Subject: Re: [PATCH 2/2] KVM: Scalable memslots implementation

On 03.02.2021 15:21, Vitaly Kuznetsov wrote:
> "Maciej S. Szmigiero" <mail@...iej.szmigiero.name> writes:
> 
>> On 03.02.2021 00:43, Sean Christopherson wrote:
>>> On Tue, Feb 02, 2021, Maciej S. Szmigiero wrote:
>>>> On 02.02.2021 02:33, Sean Christopherson wrote:
> 
> ...
> 
>>>>
>>>> I guess you mean to still turn id_to_index into a hash table, since
>>>> otherwise a VMM which uses just two memslots but numbered 0 and 508
>>>> will have a 509-entry id_to_index array allocated.
>>>
>>> That should be irrelevant for the purposes of optimizing hva lookups, and mostly
>>> irrelevant for optimizing memslot updates.  Using a hash table is almost a pure
>>> a memory optimization, it really only matters when the max number of memslots
>>> skyrockets, which is a separate discussion from optimizing hva lookups.
>>
>> While I agree this is a separate thing from scalable hva lookups it still
>> matters for the overall design.
>>
>> The current id_to_index array is fundamentally "pay the cost of max
>> number of memslots possible regardless how many you use".
>>
>> And it's not only that it takes more memory it also forces memslot
>> create / delete / move operations to be O(n) since the indices have to
>> be updated.
> 
> FWIW, I don't see a fundamental disagreement between you and Sean here,
> it's just that we may want to eat this elephant one bite at a time
> instead of trying to swallow it unchewed :-)
> 
> E.g. as a first step, we may want to introduce helper functions to not
> work with id_to_index directly and then suggest a better implementation
> (using rbtree, bynamically allocated array,...) for these helpers. This
> is definitely more work but it's likely worth it.

That's sound like a good idea, will have a look at it - thanks.

>>
>> By the way, I think nobody argues here for a bazillion of memslots.
>> It is is enough to simply remove the current cap and allow the maximum
>> number permitted by the existing KVM API, that is 32k as Vitaly's
>> patches recently did.
> 
> Yea, there's no immegiate need even for 32k as KVM_MAX_VCPUS is '288',
> we can get away with e.g. 1000 but id_to_index is the only thing which
> may make us consider something lower than 32k: if only a few slots are
> used, there's no penalty (of course slot *modify* operations are O(n)
> so for 32k it'll take a lot but these configurations are currently
> illegal and evem 'slow' is better :-)
> 
Yes, id_to_index has this problem of depending on the max number of
memlots allowed (current KVM code) or the ID of the highest memslot
currently present (possible alternative, dynamically resized
implementation) rather than just the total number of memslots
currently in use.
So it's a "pay all the cost upfront"-type implementation.

Each slot modify operation is O(n) for the current code due to
copying of the memslots array and possibly updating id_to_index indices
(and minor things like kvm_mmu_calculate_default_mmu_pages() for x86),
but it is O(log(n)) for the new implementation  - that's one of its
main benefits.

Thanks,
Maciej