[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <daf5066b-e89b-d377-ed8a-9338f1a04c0d@amd.com>
Date: Mon, 22 Nov 2021 09:23:36 -0600
From: Brijesh Singh <brijesh.singh@....com>
To: Peter Gonda <pgonda@...gle.com>
Cc: brijesh.singh@....com, x86@...nel.org,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
linux-coco@...ts.linux.dev, linux-mm@...ck.org,
linux-crypto@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
Tom Lendacky <Thomas.Lendacky@....com>,
"H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Sergio Lopez <slp@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Dov Murik <dovmurik@...ux.ibm.com>,
Tobin Feldman-Fitzthum <tobin@....com>,
Borislav Petkov <bp@...en8.de>,
Michael Roth <michael.roth@....com>,
Vlastimil Babka <vbabka@...e.cz>,
"Kirill A . Shutemov" <kirill@...temov.name>,
Andi Kleen <ak@...ux.intel.com>, tony.luck@...el.com,
marcorr@...gle.com, sathyanarayanan.kuppuswamy@...ux.intel.com
Subject: Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP)
Hypervisor Support
Hi Peter,
On 11/12/21 9:43 AM, Peter Gonda wrote:
> Hi Brijesh,,
>
> One high level discussion I'd like to have on these SNP KVM patches.
>
> In these patches (V5) if a host userspace process writes a guest
> private page a SIGBUS is issued to that process. If the kernel writes
> a guest private page then the kernel panics due to the unhandled RMP
> fault page fault. This is an issue because not all writes into guest
> memory may come from a bug in the host. For instance a malicious or
> even buggy guest could easily point the host to writing a private page
> during the emulation of many virtual devices (virtio, NVMe, etc). For
> example if a well behaved guests behavior is to: start up a driver,
> select some pages to share with the guest, ask the host to convert
> them to shared, then use those pages for virtual device DMA, if a
> buggy guest forget the step to request the pages be converted to
> shared its easy to see how the host could rightfully write to private
> memory. I think we can better guarantee host reliability when running
> SNP guests without changing SNP’s security properties.
>
> Here is an alternative to the current approach: On RMP violation (host
> or userspace) the page fault handler converts the page from private to
> shared to allow the write to continue. This pulls from s390’s error
> handling which does exactly this. See ‘arch_make_page_accessible()’.
> Additionally it adds less complexity to the SNP kernel patches, and
> requires no new ABI.
>
> In the current (V5) KVM implementation if a userspace process
> generates an RMP violation (writes to guest private memory) the
> process receives a SIGBUS. At first glance, it would appear that
> user-space shouldn’t write to private memory. However, guaranteeing
> this in a generic fashion requires locking the RMP entries (via locks
> external to the RMP). Otherwise, a user-space process emulating a
> guest device IO may be vulnerable to having the guest memory
> (maliciously or by guest bug) converted to private while user-space
> emulation is happening. This results in a well behaved userspace
> process receiving a SIGBUS.
>
> This proposal allows buggy and malicious guests to run under SNP
> without jeopardizing the reliability / safety of host processes. This
> is very important to a cloud service provider (CSP) since it’s common
> to have host wide daemons that write/read all guests, i.e. a single
> process could manage the networking for all VMs on the host. Crashing
> that singleton process kills networking for all VMs on the system.
>
Thank you for starting the thread; based on the discussion, I am keeping
the current implementation as-is and *not* going with the auto
conversion from private to shared. To summarize what we are doing in the
current SNP series:
- If userspace accesses guest private memory, it gets SIGBUS.
- If kernel accesses[*] guest private memory, it does panic.
[*] Kernel consults the RMP table for the page ownership before the
access. If the page is shared, then it uses the locking mechanism to
ensure that a guest will not be able to change the page ownership while
kernel has it mapped.
thanks
> This proposal also allows for minimal changes to the kexec flow and
> kdump. The new kexec kernel can simply update private pages to shared
> as it encounters them during their boot. This avoids needing to
> propagate the RMP state from kernel to kernel. Of course this doesn’t
> preserve any running VMs but is still useful for kdump crash dumps or
> quicker rekerneling for development with kexec.
>
> This proposal does cause guest memory corruption for some bugs but one
> of SEV-SNP’s goals extended from SEV-ES’s goals is for guest’s to be
> able to detect when its memory has been corrupted / replayed by the
> host. So SNP already has features for allowing guests to detect this
> kind of memory corruption. Additionally this is very similar to a page
> of memory generating a machine check because of 2-bit memory
> corruption. In other words SNP guests must be enlightened and ready
> for these kinds of errors.
>
> For an SNP guest running under this proposal the flow would look like this:
> * Host gets a #PF because its trying to write to a private page.
> * Host #PF handler updates the page to shared.
> * Write continues normally.
> * Guest accesses memory (r/w).
> * Guest gets a #VC error because the page is not PVALIDATED
> * Guest is now in control. Guest can terminate because its memory has
> been corrupted. Guest could try and continue to log the error to its
> owner.
>
> A similar approach was introduced in the SNP patches V1 and V2 for
> kernel page fault handling. The pushback around this convert to shared
> approach was largely focused around the idea that the kernel has all
> the information about which pages are shared vs private so it should
> be able to check shared status before write to pages. After V2 the
> patches were updated to not have a kernel page fault handler for RMP
> violations (other than dumping state during a panic). The current
> patches protect the host with new post_{map,unmap}_gfn() function that
> checks if a page is shared before mapping it, then locks the page
> shared until unmapped. Given the discussions on ‘[Part2,v5,39/45] KVM:
> SVM: Introduce ops for the post gfn map and unmap’ building a solution
> to do this is non trivial and adds new overheads to KVM. Additionally
> the current solution is local to the kernel. So a new ABI just now be
> created to allow the userspace VMM to access the kernel-side locks for
> this to work generically for the whole host. This is more complicated
> than this proposal and adding more lock holders seems like it could
> reduce performance further.
>
> There are a couple corner cases with this approach. Under SNP guests
> can request their memory be changed into a VMSA. This VMSA page cannot
> be changed to shared while the vCPU associated with it is running. So
> KVM + the #PF handler will need something to kick vCPUs from running.
> Joerg believes that a possible fix for this could be a new MMU
> notifier in the kernel, then on the #PF we can go through the rmp and
> execute this vCPU kick callback.
>
> Another corner case is the RMPUPDATE instruction is not guaranteed to
> succeed on first iteration. As noted above if the page is a VMSA it
> cannot be updated while the vCPU is running. Another issue is if the
> guest is running a RMPADJUST on a page it cannot be RMPUPDATED at that
> time. There is a lock for each RMP Entry so there is a race for these
> instructions. The vCPU kicking can solve this issue to be kicking all
> guest vCPUs which removes the chance for the race.
>
> Since this proposal probably results in SNP guests terminating due to
> a page unexpectedly needing PVALIDATE. The approach could be
> simplified to just the KVM killing the guest. I think it's nicer to
> users to instead of unilaterally killing the guest allowing the
> unvalidated #VC exception to allow users to collect some additional
> debug information and any additional clean up work they would like to
> perform.
>
> Thanks
> Peter
>
> On Fri, Aug 20, 2021 at 9:59 AM Brijesh Singh <brijesh.singh@....com> wrote:
>>
>> This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
>> changes required in a host OS for SEV-SNP support. The series builds upon
>> SEV-SNP Part-1.
>>
>> This series provides the basic building blocks to support booting the SEV-SNP
>> VMs, it does not cover all the security enhancement introduced by the SEV-SNP
>> such as interrupt protection.
>>
>> The CCP driver is enhanced to provide new APIs that use the SEV-SNP
>> specific commands defined in the SEV-SNP firmware specification. The KVM
>> driver uses those APIs to create and managed the SEV-SNP guests.
>>
>> The GHCB specification version 2 introduces new set of NAE's that is
>> used by the SEV-SNP guest to communicate with the hypervisor. The series
>> provides support to handle the following new NAE events:
>> - Register GHCB GPA
>> - Page State Change Request
>> - Hypevisor feature
>> - Guest message request
>>
>> The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
>> access requires an RMP check. In particular, the read accesses from the
>> hypervisor do not require RMP checks because the data confidentiality is
>> already protected via memory encryption. When hardware encounters an RMP
>> checks failure, it raises a page-fault exception. If RMP check failure
>> is due to the page-size mismatch, then split the large page to resolve
>> the fault.
>>
>> The series does not provide support for the interrupt security and migration
>> and those feature will be added after the base support.
>>
>> The series is based on the commit:
>> SNP part1 commit and
>> fa7a549d321a (kvm/next, next) KVM: x86: accept userspace interrupt only if no event is injected
>>
>> TODO:
>> * Add support for command to ratelimit the guest message request.
>>
>> Changes since v4:
>> * Move the RMP entry definition to x86 specific header file.
>> * Move the dump RMP entry function to SEV specific file.
>> * Use BIT_ULL while defining the #PF bit fields.
>> * Add helper function to check the IOMMU support for SEV-SNP feature.
>> * Add helper functions for the page state transition.
>> * Map and unmap the pages from the direct map after page is added or
>> removed in RMP table.
>> * Enforce the minimum SEV-SNP firmware version.
>> * Extend the LAUNCH_UPDATE to accept the base_gfn and remove the
>> logic to calculate the gfn from the hva.
>> * Add a check in LAUNCH_UPDATE to ensure that all the pages are
>> shared before calling the PSP.
>> * Mark the memory failure when failing to remove the page from the
>> RMP table or clearing the immutable bit.
>> * Exclude the encrypted hva range from the KSM.
>> * Remove the gfn tracking during the kvm_gfn_map() and use SRCU to
>> syncronize the PSC and gfn mapping.
>> * Allow PSC on the registered hva range only.
>> * Add support for the Preferred GPA VMGEXIT.
>> * Simplify the PSC handling routines.
>> * Use the static_call() for the newly added kvm_x86_ops.
>> * Remove the long-lived GHCB map.
>> * Move the snp enable module parameter to the end of the file.
>> * Remove the kvm_x86_op for the RMP fault handling. Call the
>> fault handler directly from the #NPF interception.
>>
>> Changes since v3:
>> * Add support for extended guest message request.
>> * Add ioctl to query the SNP Platform status.
>> * Add ioctl to get and set the SNP config.
>> * Add check to verify that memory reserved for the RMP covers the full system RAM.
>> * Start the SNP specific commands from 256 instead of 255.
>> * Multiple cleanup and fixes based on the review feedback.
>>
>> Changes since v2:
>> * Add AP creation support.
>> * Drop the patch to handle the RMP fault for the kernel address.
>> * Add functions to track the write access from the hypervisor.
>> * Do not enable the SNP feature when IOMMU is disabled or is in passthrough mode.
>> * Dump the RMP entry on RMP violation for the debug.
>> * Shorten the GHCB macro names.
>> * Start the SNP_INIT command id from 255 to give some gap for the legacy SEV.
>> * Sync the header with the latest 0.9 SNP spec.
>>
>> Changes since v1:
>> * Add AP reset MSR protocol VMGEXIT NAE.
>> * Add Hypervisor features VMGEXIT NAE.
>> * Move the RMP table initialization and RMPUPDATE/PSMASH helper in
>> arch/x86/kernel/sev.c.
>> * Add support to map/unmap SEV legacy command buffer to firmware state when
>> SNP is active.
>> * Enhance PSP driver to provide helper to allocate/free memory used for the
>> firmware context page.
>> * Add support to handle RMP fault for the kernel address.
>> * Add support to handle GUEST_REQUEST NAE event for attestation.
>> * Rename RMP table lookup helper.
>> * Drop typedef from rmpentry struct definition.
>> * Drop SNP static key and use cpu_feature_enabled() to check whether SEV-SNP
>> is active.
>> * Multiple cleanup/fixes to address Boris review feedback.
>>
>> Brijesh Singh (40):
>> x86/cpufeatures: Add SEV-SNP CPU feature
>> iommu/amd: Introduce function to check SEV-SNP support
>> x86/sev: Add the host SEV-SNP initialization support
>> x86/sev: Add RMP entry lookup helpers
>> x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
>> x86/sev: Invalid pages from direct map when adding it to RMP table
>> x86/traps: Define RMP violation #PF error code
>> x86/fault: Add support to handle the RMP fault for user address
>> x86/fault: Add support to dump RMP entry on fault
>> crypto: ccp: shutdown SEV firmware on kexec
>> crypto:ccp: Define the SEV-SNP commands
>> crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
>> crypto:ccp: Provide APIs to issue SEV-SNP commands
>> crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
>> crypto: ccp: Handle the legacy SEV command when SNP is enabled
>> crypto: ccp: Add the SNP_PLATFORM_STATUS command
>> crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
>> crypto: ccp: Provide APIs to query extended attestation report
>> KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
>> KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
>> KVM: SVM: Add initial SEV-SNP support
>> KVM: SVM: Add KVM_SNP_INIT command
>> KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
>> KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
>> KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
>> KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
>> KVM: X86: Keep the NPT and RMP page level in sync
>> KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
>> KVM: x86: Define RMP page fault error bits for #NPF
>> KVM: x86: Update page-fault trace to log full 64-bit error code
>> KVM: SVM: Do not use long-lived GHCB map while setting scratch area
>> KVM: SVM: Remove the long-lived GHCB host map
>> KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
>> KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
>> KVM: SVM: Add support to handle Page State Change VMGEXIT
>> KVM: SVM: Introduce ops for the post gfn map and unmap
>> KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
>> KVM: SVM: Add support to handle the RMP nested page fault
>> KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
>> KVM: SVM: Add module parameter to enable the SEV-SNP
>>
>> Sean Christopherson (2):
>> KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
>> KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP
>>
>> Tom Lendacky (3):
>> KVM: SVM: Add support to handle AP reset MSR protocol
>> KVM: SVM: Use a VMSA physical address variable for populating VMCB
>> KVM: SVM: Support SEV-SNP AP Creation NAE event
>>
>> Documentation/virt/coco/sevguest.rst | 55 +
>> .../virt/kvm/amd-memory-encryption.rst | 102 +
>> arch/x86/include/asm/cpufeatures.h | 1 +
>> arch/x86/include/asm/disabled-features.h | 8 +-
>> arch/x86/include/asm/kvm-x86-ops.h | 5 +
>> arch/x86/include/asm/kvm_host.h | 20 +
>> arch/x86/include/asm/msr-index.h | 6 +
>> arch/x86/include/asm/sev-common.h | 28 +
>> arch/x86/include/asm/sev.h | 45 +
>> arch/x86/include/asm/svm.h | 7 +
>> arch/x86/include/asm/trap_pf.h | 18 +-
>> arch/x86/kernel/cpu/amd.c | 3 +-
>> arch/x86/kernel/sev.c | 361 ++++
>> arch/x86/kvm/lapic.c | 5 +-
>> arch/x86/kvm/mmu.h | 7 +-
>> arch/x86/kvm/mmu/mmu.c | 84 +-
>> arch/x86/kvm/svm/sev.c | 1676 ++++++++++++++++-
>> arch/x86/kvm/svm/svm.c | 62 +-
>> arch/x86/kvm/svm/svm.h | 74 +-
>> arch/x86/kvm/trace.h | 40 +-
>> arch/x86/kvm/x86.c | 92 +-
>> arch/x86/mm/fault.c | 84 +-
>> drivers/crypto/ccp/sev-dev.c | 924 ++++++++-
>> drivers/crypto/ccp/sev-dev.h | 17 +
>> drivers/crypto/ccp/sp-pci.c | 12 +
>> drivers/iommu/amd/init.c | 30 +
>> include/linux/iommu.h | 9 +
>> include/linux/mm.h | 6 +-
>> include/linux/psp-sev.h | 346 ++++
>> include/linux/sev.h | 32 +
>> include/uapi/linux/kvm.h | 56 +
>> include/uapi/linux/psp-sev.h | 60 +
>> mm/memory.c | 13 +
>> tools/arch/x86/include/asm/cpufeatures.h | 1 +
>> 34 files changed, 4088 insertions(+), 201 deletions(-)
>> create mode 100644 include/linux/sev.h
>>
>> --
>> 2.17.1
>>
Powered by blists - more mailing lists