[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c4b33753-01d7-684e-23ac-1189bd217761@amd.com>
Date: Wed, 30 Mar 2022 10:12:24 +0530
From: "Nikunj A. Dadhania" <nikunj@....com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>,
Brijesh Singh <brijesh.singh@....com>,
Tom Lendacky <thomas.lendacky@....com>,
Peter Gonda <pgonda@...gle.com>,
Bharata B Rao <bharata@....com>,
"Maciej S . Szmigiero" <mail@...iej.szmigiero.name>,
Mingwei Zhang <mizhang@...gle.com>,
David Hildenbrand <david@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC v1 0/9] KVM: SVM: Defer page pinning for SEV guests
On 3/29/2022 2:30 AM, Sean Christopherson wrote:
> On Tue, Mar 08, 2022, Nikunj A Dadhania wrote:
>> This is a follow-up to the RFC implementation [1] that incorporates
>> review feedback and bug fixes. See the "RFC v1" section below for a
>> list of changes.
>
> Heh, for future reference, the initial posting of a series/patch/RFC is implicitly
> v1, i.e. this should be RFC v2.
Sure.
>
>> SEV guest requires the guest's pages to be pinned in host physical
>> memory as migration of encrypted pages is not supported. The memory
>> encryption scheme uses the physical address of the memory being
>> encrypted. If guest pages are moved by the host, content decrypted in
>> the guest would be incorrect thereby corrupting guest's memory.
>>
>> For SEV/SEV-ES guests, the hypervisor doesn't know which pages are
>> encrypted and when the guest is done using those pages. Hypervisor
>> should treat all the guest pages as encrypted until they are
>> deallocated or the guest is destroyed.
>>
>> While provision a pfn, make KVM aware that guest pages need to be
>> pinned for long-term and use appropriate pin_user_pages API for these
>> special encrypted memory regions. KVM takes the first reference and
>> holds it until a mapping is done. Take an extra reference before KVM
>> releases the pfn.
>>
>> Actual pinning management is handled by vendor code via new
>> kvm_x86_ops hooks. MMU calls in to vendor code to pin the page on
>> demand. Metadata of the pinning is stored in architecture specific
>> memslot area. During the memslot freeing path and deallocation path
>> guest pages are unpinned.
>>
>> Guest boot time comparison:
>> +---------------+----------------+-------------------+
>> | Guest Memory | baseline | Demand Pinning + |
>> | Size (GB) | v5.17-rc6(secs)| v5.17-rc6(secs) |
>> +---------------+----------------+-------------------+
>> | 4 | 6.16 | 5.71 |
>> +---------------+----------------+-------------------+
>> | 16 | 7.38 | 5.91 |
>> +---------------+----------------+-------------------+
>> | 64 | 12.17 | 6.16 |
>> +---------------+----------------+-------------------+
>> | 128 | 18.20 | 6.50 |
>> +---------------+----------------+-------------------+
>> | 192 | 24.56 | 6.80 |
>> +---------------+----------------+-------------------+
>
> Let me preface this by saying I generally like the idea and especially the
> performance, but...
>
> I think we should abandon this approach in favor of committing all our resources
> to fd-based private memory[*], which (if done right) will provide on-demand pinning
> for "free".
I will give this a try for SEV, was on my todo list.
> I would much rather get that support merged sooner than later, and use
> it as a carrot for legacy SEV to get users to move over to its new APIs, with a long
> term goal of deprecating and disallowing SEV/SEV-ES guests without fd-based private
> memory.
> That would require guest kernel support to communicate private vs. shared,
Could you explain this in more detail? This is required for punching hole for shared pages?
> but SEV guests already "need" to do that to play nice with live migration, so it's
> not a big ask, just another carrot to entice guests/customers to update their kernel
> (and possibly users to update their guest firmware).
>
> This series isn't awful by any means, but it requires poking into core flows and
> further complicates paths that are already anything but simple. And things like
> conditionally grabbing vCPU0 to pin pages in its MMU make me flinch. And I think
> the situation would only get worse by the time all the bugs and corner cases are
> ironed out. E.g. this code is wrong:
>
> void kvm_release_pfn_clean(kvm_pfn_t pfn)
> {
> if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) {
> struct page *page = pfn_to_page(pfn);
>
> if (page_maybe_dma_pinned(page))
> unpin_user_page(page);
> else
> put_page(page);
> }
> }
> EXPORT_SYMBOL_GPL(kvm_release_pfn_clean);
>
> Because (a) page_maybe_dma_pinned() is susceptible to false positives (clearly
> documented), and (b) even if it didn't get false positives, there's no guarantee
> that _KVM_ owns a pin of the page.
Right, the pinning could have been done by some other subsystem.
>
> It's not an impossible problem to solve, but I suspect any solution will require
> either touching a lot of code or will be fragile and difficult to maintain, e.g.
> by auditing all users to understand which need to pin and which don't. Even if
> we _always_ pin memory for SEV guests, we'd still need to plumb the "is SEV guest"
> info around.
>
> And FWIW, my years-old idea of using a software-available SPTE bit to track pinned
> pages is plagued by the same underlying issue: KVM's current management (or lack
> thereof) of SEV guest memory just isn't viable long term. In all honesty, it
> probably should never have been merged. We can't change the past, but we can,
> and IMO should, avoid piling on more code to an approach that is fundamentally flawed.
>
> [*] https://lore.kernel.org/all/20220310140911.50924-1-chao.p.peng@linux.intel.com
>
Thanks for the valuable feedback.
Regards
Nikunj
Powered by blists - more mailing lists