[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <22268ddb-5643-f35e-6c34-eb5c2b0ad4cb@amd.com>
Date: Mon, 21 Mar 2022 14:49:27 +0530
From: "Nikunj A. Dadhania" <nikunj@....com>
To: Mingwei Zhang <mizhang@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Brijesh Singh <brijesh.singh@....com>,
Tom Lendacky <thomas.lendacky@....com>,
Peter Gonda <pgonda@...gle.com>,
Bharata B Rao <bharata@....com>,
"Maciej S . Szmigiero" <mail@...iej.szmigiero.name>,
David Hildenbrand <david@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC v1 5/9] KVM: SVM: Implement demand page pinning
On 3/21/2022 11:41 AM, Mingwei Zhang wrote:
> On Wed, Mar 09, 2022, Nikunj A. Dadhania wrote:
>> On 3/9/2022 3:23 AM, Mingwei Zhang wrote:
>>> On Tue, Mar 08, 2022, Nikunj A Dadhania wrote:
>>>> Use the memslot metadata to store the pinned data along with the pfns.
>>>> This improves the SEV guest startup time from O(n) to a constant by
>>>> deferring guest page pinning until the pages are used to satisfy
>>>> nested page faults. The page reference will be dropped in the memslot
>>>> free path or deallocation path.
>>>>
>>>> Reuse enc_region structure definition as pinned_region to maintain
>>>> pages that are pinned outside of MMU demand pinning. Remove rest of
>>>> the code which did upfront pinning, as they are no longer needed in
>>>> view of the demand pinning support.
>>>
>>> I don't quite understand why we still need the enc_region. I have
>>> several concerns. Details below.
>>
>> With patch 9 the enc_region is used only for memory that was pinned before
>> the vcpu is online (i.e. mmu is not yet usable)
>>
>>>>
>>>> Retain svm_register_enc_region() and svm_unregister_enc_region() with
>>>> required checks for resource limit.
>>>>
>>>> Guest boot time comparison
>>>> +---------------+----------------+-------------------+
>>>> | Guest Memory | baseline | Demand Pinning |
>>>> | Size (GB) | (secs) | (secs) |
>>>> +---------------+----------------+-------------------+
>>>> | 4 | 6.16 | 5.71 |
>>>> +---------------+----------------+-------------------+
>>>> | 16 | 7.38 | 5.91 |
>>>> +---------------+----------------+-------------------+
>>>> | 64 | 12.17 | 6.16 |
>>>> +---------------+----------------+-------------------+
>>>> | 128 | 18.20 | 6.50 |
>>>> +---------------+----------------+-------------------+
>>>> | 192 | 24.56 | 6.80 |
>>>> +---------------+----------------+-------------------+
>>>>
>>>> Signed-off-by: Nikunj A Dadhania <nikunj@....com>
>>>> ---
>>>> arch/x86/kvm/svm/sev.c | 304 ++++++++++++++++++++++++++---------------
>>>> arch/x86/kvm/svm/svm.c | 1 +
>>>> arch/x86/kvm/svm/svm.h | 6 +-
>>>> 3 files changed, 200 insertions(+), 111 deletions(-)
>>>>
<SNIP>
>>>> static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
>>>> unsigned long ulen, unsigned long *n,
>>>> int write)
>>>> {
>>>> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> + struct pinned_region *region;
>>>> unsigned long npages, size;
>>>> int npinned;
>>>> - unsigned long locked, lock_limit;
>>>> struct page **pages;
>>>> - unsigned long first, last;
>>>> int ret;
>>>>
>>>> lockdep_assert_held(&kvm->lock);
>>>> @@ -395,15 +413,12 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
>>>> if (ulen == 0 || uaddr + ulen < uaddr)
>>>> return ERR_PTR(-EINVAL);
>>>>
>>>> - /* Calculate number of pages. */
>>>> - first = (uaddr & PAGE_MASK) >> PAGE_SHIFT;
>>>> - last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT;
>>>> - npages = (last - first + 1);
>>>> + npages = get_npages(uaddr, ulen);
>>>>
>>>> - locked = sev->pages_locked + npages;
>>>> - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>>>> - if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
>>>> - pr_err("SEV: %lu locked pages exceed the lock limit of %lu.\n", locked, lock_limit);
>>>> + if (rlimit_memlock_exceeds(sev->pages_to_lock, npages)) {
>>>> + pr_err("SEV: %lu locked pages exceed the lock limit of %lu.\n",
>>>> + sev->pages_to_lock + npages,
>>>> + (rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT));
>>>> return ERR_PTR(-ENOMEM);
>>>> }
>>>>
>>>> @@ -429,7 +444,19 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
>>>> }
>>>>
>>>> *n = npages;
>>>> - sev->pages_locked = locked;
>>>> + sev->pages_to_lock += npages;
>>>> +
>>>> + /* Maintain region list that is pinned to be unpinned in vm destroy path */
>>>> + region = kzalloc(sizeof(*region), GFP_KERNEL_ACCOUNT);
>>>> + if (!region) {
>>>> + ret = -ENOMEM;
>>>> + goto err;
>>>> + }
>>>> + region->uaddr = uaddr;
>>>> + region->size = ulen;
>>>> + region->pages = pages;
>>>> + region->npages = npages;
>>>> + list_add_tail(®ion->list, &sev->pinned_regions_list);
>>>
>>> Hmm. I see a duplication of the metadata. We already store the pfns in
>>> memslot. But now we also do it in regions. Is this one used for
>>> migration purpose?
>>
>> We are not duplicating, the enc_region holds regions that are pinned other
>> than svm_register_enc_region(). Later patches add infrastructure to directly
>> fault-in those pages which will use memslot->pfns.
>>
>>>
>>> I might miss some of the context here.
>>
>> More context here:
>> https://lore.kernel.org/kvm/CAMkAt6p1-82LTRNB3pkPRwYh=wGpreUN=jcUeBj_dZt8ss9w0Q@mail.gmail.com/
>
> hmm. I think I might got the point. However, logically, I still think we
> might not need double data structures for pinning. When vcpu is not
> online, we could use the the array in memslot to contain the pinned
> pages, right?
Yes.
> Since user-level code is not allowed to pin arbitrary regions of HVA, we
> could check that and bail out early if the region goes out of a memslot.
>
> From that point, the only requirement is that we need a valid memslot
> before doing memory encryption and pinning. So enc_region is still not
> needed from this point.
>
> This should save some time to avoid double pinning and make the pinning
> information clear.
Agreed, I think that should be possible:
* Check for addr/end being part of a memslot
* Error out in case it is not part of any memslot
* Add __sev_pin_pfn() which is not dependent on vcpu arg.
* Iterate over the pages and use __sev_pin_pfn() routine to pin.
slots = kvm_memslots(kvm);
kvm_for_each_memslot_in_hva_range(node, slots, addr, end) {
slot = container_of(node, struct kvm_memory_slot,
hva_node[slots->node_idx]);
slot_start = slot->userspace_addr;
slot_end = slot_start + (slot->npages << PAGE_SHIFT);
hva_start = max(addr, slot_start);
hva_end = min(end, slot_end)
for (uaddr = hva_start; uaddr < hva_end; uaddr += PAGE_SIZE) {
__sev_pin_pfn(slot, uaddr, PG_LEVEL_4K)
}
}
This will make sure memslot based data structure is used and enc_region can be removed.
Regards
Nikunj
Powered by blists - more mailing lists