[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YWYm/Gw8PbaAKBF0@google.com>
Date: Wed, 13 Oct 2021 00:23:24 +0000
From: Sean Christopherson <seanjc@...gle.com>
To: Brijesh Singh <brijesh.singh@....com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
linux-coco@...ts.linux.dev, linux-mm@...ck.org,
linux-crypto@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
Tom Lendacky <thomas.lendacky@....com>,
"H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Sergio Lopez <slp@...hat.com>, Peter Gonda <pgonda@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Dov Murik <dovmurik@...ux.ibm.com>,
Tobin Feldman-Fitzthum <tobin@....com>,
Borislav Petkov <bp@...en8.de>,
Michael Roth <michael.roth@....com>,
Vlastimil Babka <vbabka@...e.cz>,
"Kirill A . Shutemov" <kirill@...temov.name>,
Andi Kleen <ak@...ux.intel.com>, tony.luck@...el.com,
marcorr@...gle.com, sathyanarayanan.kuppuswamy@...ux.intel.com
Subject: Re: [PATCH Part2 v5 39/45] KVM: SVM: Introduce ops for the post gfn
map and unmap
On Fri, Aug 20, 2021, Brijesh Singh wrote:
> When SEV-SNP is enabled in the guest VM, the guest memory pages can
> either be a private or shared. A write from the hypervisor goes through
> the RMP checks. If hardware sees that hypervisor is attempting to write
> to a guest private page, then it triggers an RMP violation #PF.
>
> To avoid the RMP violation, add post_{map,unmap}_gfn() ops that can be
> used to verify that its safe to map a given guest page. Use the SRCU to
> protect against the page state change for existing mapped pages.
SRCU isn't protecting anything. The synchronize_srcu_expedited() in the PSC code
forces it to wait for existing maps to go away, but it doesn't prevent new maps
from being created while the actual RMP updates are in-flight. Most telling is
that the RMP updates happen _after_ the synchronize_srcu_expedited() call.
This also doesn't handle kvm_{read,write}_guest_cached().
I can't help but note that the private memslots idea[*] would handle this gracefully,
e.g. the memslot lookup would fail, and any change in private memslots would
invalidate the cache due to a generation mismatch.
KSM is another mess that would Just Work.
I'm not saying that SNP should be blocked on support for unmapping guest private
memory, but I do think we should strongly consider focusing on that effort rather
than trying to fix things piecemeal throughout KVM. I don't think it's too absurd
to say that it might actually be faster overall. And I 100% think that having a
cohesive design and uABI for SNP and TDX would be hugely beneficial to KVM.
[*] https://lkml.kernel.org/r/20210824005248.200037-1-seanjc@google.com
> +int sev_post_map_gfn(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int *token)
> +{
> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
> + int level;
> +
> + if (!sev_snp_guest(kvm))
> + return 0;
> +
> + *token = srcu_read_lock(&sev->psc_srcu);
> +
> + /* If pfn is not added as private then fail */
This comment and the pr_err() are backwards, and confused the heck out of me.
snp_lookup_rmpentry() returns '1' if the pfn is assigned, a.k.a. private. That
means this code throws an error if the page is private, i.e. requires the page
to be shared. Which makes sense given the use cases, it's just incredibly
confusing.
> + if (snp_lookup_rmpentry(pfn, &level) == 1) {
Any reason not to provide e.g. rmp_is_shared() and rmp_is_private() so that
callers don't have to care as much about the return values? The -errno/0/1
semantics are all but guarantee to bite us in the rear at some point.
Actually, peeking at other patches, I think it already has. This usage in
__unregister_enc_region_locked() is wrong:
/*
* The guest memory pages are assigned in the RMP table. Unassign it
* before releasing the memory.
*/
if (sev_snp_guest(kvm)) {
for (i = 0; i < region->npages; i++) {
pfn = page_to_pfn(region->pages[i]);
if (!snp_lookup_rmpentry(pfn, &level)) <-- attempts make_shared() on non-existent entry
continue;
cond_resched();
if (level > PG_LEVEL_4K)
pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
host_rmp_make_shared(pfn, level, true);
}
}
> + srcu_read_unlock(&sev->psc_srcu, *token);
> + pr_err_ratelimited("failed to map private gfn 0x%llx pfn 0x%llx\n", gfn, pfn);
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index d10f7166b39d..ff91184f9b4a 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -76,16 +76,22 @@ struct kvm_sev_info {
> bool active; /* SEV enabled guest */
> bool es_active; /* SEV-ES enabled guest */
> bool snp_active; /* SEV-SNP enabled guest */
> +
> unsigned int asid; /* ASID used for this guest */
> unsigned int handle; /* SEV firmware handle */
> int fd; /* SEV device fd */
> +
> unsigned long pages_locked; /* Number of pages locked */
> struct list_head regions_list; /* List of registered regions */
> +
> u64 ap_jump_table; /* SEV-ES AP Jump Table address */
> +
> struct kvm *enc_context_owner; /* Owner of copied encryption context */
> struct misc_cg *misc_cg; /* For misc cgroup accounting */
> +
Unrelated whitespace changes.
> u64 snp_init_flags;
> void *snp_context; /* SNP guest context page */
> + struct srcu_struct psc_srcu;
> };
Powered by blists - more mailing lists