lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <daf5066b-e89b-d377-ed8a-9338f1a04c0d@amd.com>
Date:   Mon, 22 Nov 2021 09:23:36 -0600
From:   Brijesh Singh <brijesh.singh@....com>
To:     Peter Gonda <pgonda@...gle.com>
Cc:     brijesh.singh@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        linux-crypto@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
        Tom Lendacky <Thomas.Lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sergio Lopez <slp@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Dov Murik <dovmurik@...ux.ibm.com>,
        Tobin Feldman-Fitzthum <tobin@....com>,
        Borislav Petkov <bp@...en8.de>,
        Michael Roth <michael.roth@....com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Andi Kleen <ak@...ux.intel.com>, tony.luck@...el.com,
        marcorr@...gle.com, sathyanarayanan.kuppuswamy@...ux.intel.com
Subject: Re: [PATCH Part2 v5 00/45] Add AMD Secure Nested Paging (SEV-SNP)
 Hypervisor Support

Hi Peter,

On 11/12/21 9:43 AM, Peter Gonda wrote:
> Hi Brijesh,,
> 
> One high level discussion I'd like to have on these SNP KVM patches.
> 
> In these patches (V5) if a host userspace process writes a guest
> private page a SIGBUS is issued to that process. If the kernel writes
> a guest private page then the kernel panics due to the unhandled RMP
> fault page fault. This is an issue because not all writes into guest
> memory may come from a bug in the host. For instance a malicious or
> even buggy guest could easily point the host to writing a private page
> during the emulation of many virtual devices (virtio, NVMe, etc). For
> example if a well behaved guests behavior is to: start up a driver,
> select some pages to share with the guest, ask the host to convert
> them to shared, then use those pages for virtual device DMA, if a
> buggy guest forget the step to request the pages be converted to
> shared its easy to see how the host could rightfully write to private
> memory. I think we can better guarantee host reliability when running
> SNP guests without changing SNP’s security properties.
> 
> Here is an alternative to the current approach: On RMP violation (host
> or userspace) the page fault handler converts the page from private to
> shared to allow the write to continue. This pulls from s390’s error
> handling which does exactly this. See ‘arch_make_page_accessible()’.
> Additionally it adds less complexity to the SNP kernel patches, and
> requires no new ABI.
> 
> In the current (V5) KVM implementation if a userspace process
> generates an RMP violation (writes to guest private memory) the
> process receives a SIGBUS. At first glance, it would appear that
> user-space shouldn’t write to private memory. However, guaranteeing
> this in a generic fashion requires locking the RMP entries (via locks
> external to the RMP). Otherwise, a user-space process emulating a
> guest device IO may be vulnerable to having the guest memory
> (maliciously or by guest bug) converted to private while user-space
> emulation is happening. This results in a well behaved userspace
> process receiving a SIGBUS.
> 
> This proposal allows buggy and malicious guests to run under SNP
> without jeopardizing the reliability / safety of host processes. This
> is very important to a cloud service provider (CSP) since it’s common
> to have host wide daemons that write/read all guests, i.e. a single
> process could manage the networking for all VMs on the host. Crashing
> that singleton process kills networking for all VMs on the system.
> 
Thank you for starting the thread; based on the discussion, I am keeping 
the current implementation as-is and *not* going with the auto 
conversion from private to shared. To summarize what we are doing in the 
current SNP series:

- If userspace accesses guest private memory, it gets SIGBUS.
- If kernel accesses[*] guest private memory, it does panic.

[*] Kernel consults the RMP table for the page ownership before the 
access. If the page is shared, then it uses the locking mechanism to 
ensure that a guest will not be able to change the page ownership while 
kernel has it mapped.

thanks

> This proposal also allows for minimal changes to the kexec flow and
> kdump. The new kexec kernel can simply update private pages to shared
> as it encounters them during their boot. This avoids needing to
> propagate the RMP state from kernel to kernel. Of course this doesn’t
> preserve any running VMs but is still useful for kdump crash dumps or
> quicker rekerneling for development with kexec.
> 
> This proposal does cause guest memory corruption for some bugs but one
> of SEV-SNP’s goals extended from SEV-ES’s goals is for guest’s to be
> able to detect when its memory has been corrupted / replayed by the
> host. So SNP already has features for allowing guests to detect this
> kind of memory corruption. Additionally this is very similar to a page
> of memory generating a machine check because of 2-bit memory
> corruption. In other words SNP guests must be enlightened and ready
> for these kinds of errors.
> 
> For an SNP guest running under this proposal the flow would look like this:
> * Host gets a #PF because its trying to write to a private page.
> * Host #PF handler updates the page to shared.
> * Write continues normally.
> * Guest accesses memory (r/w).
> * Guest gets a #VC error because the page is not PVALIDATED
> * Guest is now in control. Guest can terminate because its memory has
> been corrupted. Guest could try and continue to log the error to its
> owner.
> 
> A similar approach was introduced in the SNP patches V1 and V2 for
> kernel page fault handling. The pushback around this convert to shared
> approach was largely focused around the idea that the kernel has all
> the information about which pages are shared vs private so it should
> be able to check shared status before write to pages. After V2 the
> patches were updated to not have a kernel page fault handler for RMP
> violations (other than dumping state during a panic). The current
> patches protect the host with new post_{map,unmap}_gfn() function that
> checks if a page is shared before mapping it, then locks the page
> shared until unmapped. Given the discussions on ‘[Part2,v5,39/45] KVM:
> SVM: Introduce ops for the post gfn map and unmap’ building a solution
> to do this is non trivial and adds new overheads to KVM. Additionally
> the current solution is local to the kernel. So a new ABI just now be
> created to allow the userspace VMM to access the kernel-side locks for
> this to work generically for the whole host. This is more complicated
> than this proposal and adding more lock holders seems like it could
> reduce performance further.
> 
> There are a couple corner cases with this approach. Under SNP guests
> can request their memory be changed into a VMSA. This VMSA page cannot
> be changed to shared while the vCPU associated with it is running. So
> KVM + the #PF handler will need something to kick vCPUs from running.
> Joerg believes that a possible fix for this could be a new MMU
> notifier in the kernel, then on the #PF we can go through the rmp and
> execute this vCPU kick callback.
> 
> Another corner case is the RMPUPDATE instruction is not guaranteed to
> succeed on first iteration. As noted above if the page is a VMSA it
> cannot be updated while the vCPU is running. Another issue is if the
> guest is running a RMPADJUST on a page it cannot be RMPUPDATED at that
> time. There is a lock for each RMP Entry so there is a race for these
> instructions. The vCPU kicking can solve this issue to be kicking all
> guest vCPUs which removes the chance for the race.
> 
> Since this proposal probably results in SNP guests terminating due to
> a page unexpectedly needing PVALIDATE. The approach could be
> simplified to just the KVM killing the guest. I think it's nicer to
> users to instead of unilaterally killing the guest allowing the
> unvalidated #VC exception to allow users to collect some additional
> debug information and any additional clean up work they would like to
> perform.
> 
> Thanks
> Peter
> 
> On Fri, Aug 20, 2021 at 9:59 AM Brijesh Singh <brijesh.singh@....com> wrote:
>>
>> This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
>> changes required in a host OS for SEV-SNP support. The series builds upon
>> SEV-SNP Part-1.
>>
>> This series provides the basic building blocks to support booting the SEV-SNP
>> VMs, it does not cover all the security enhancement introduced by the SEV-SNP
>> such as interrupt protection.
>>
>> The CCP driver is enhanced to provide new APIs that use the SEV-SNP
>> specific commands defined in the SEV-SNP firmware specification. The KVM
>> driver uses those APIs to create and managed the SEV-SNP guests.
>>
>> The GHCB specification version 2 introduces new set of NAE's that is
>> used by the SEV-SNP guest to communicate with the hypervisor. The series
>> provides support to handle the following new NAE events:
>> - Register GHCB GPA
>> - Page State Change Request
>> - Hypevisor feature
>> - Guest message request
>>
>> The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
>> access requires an RMP check. In particular, the read accesses from the
>> hypervisor do not require RMP checks because the data confidentiality is
>> already protected via memory encryption. When hardware encounters an RMP
>> checks failure, it raises a page-fault exception. If RMP check failure
>> is due to the page-size mismatch, then split the large page to resolve
>> the fault.
>>
>> The series does not provide support for the interrupt security and migration
>> and those feature will be added after the base support.
>>
>> The series is based on the commit:
>>   SNP part1 commit and
>>   fa7a549d321a (kvm/next, next) KVM: x86: accept userspace interrupt only if no event is injected
>>
>> TODO:
>>    * Add support for command to ratelimit the guest message request.
>>
>> Changes since v4:
>>   * Move the RMP entry definition to x86 specific header file.
>>   * Move the dump RMP entry function to SEV specific file.
>>   * Use BIT_ULL while defining the #PF bit fields.
>>   * Add helper function to check the IOMMU support for SEV-SNP feature.
>>   * Add helper functions for the page state transition.
>>   * Map and unmap the pages from the direct map after page is added or
>>     removed in RMP table.
>>   * Enforce the minimum SEV-SNP firmware version.
>>   * Extend the LAUNCH_UPDATE to accept the base_gfn and remove the
>>     logic to calculate the gfn from the hva.
>>   * Add a check in LAUNCH_UPDATE to ensure that all the pages are
>>     shared before calling the PSP.
>>   * Mark the memory failure when failing to remove the page from the
>>     RMP table or clearing the immutable bit.
>>   * Exclude the encrypted hva range from the KSM.
>>   * Remove the gfn tracking during the kvm_gfn_map() and use SRCU to
>>     syncronize the PSC and gfn mapping.
>>   * Allow PSC on the registered hva range only.
>>   * Add support for the Preferred GPA VMGEXIT.
>>   * Simplify the PSC handling routines.
>>   * Use the static_call() for the newly added kvm_x86_ops.
>>   * Remove the long-lived GHCB map.
>>   * Move the snp enable module parameter to the end of the file.
>>   * Remove the kvm_x86_op for the RMP fault handling. Call the
>>     fault handler directly from the #NPF interception.
>>
>> Changes since v3:
>>   * Add support for extended guest message request.
>>   * Add ioctl to query the SNP Platform status.
>>   * Add ioctl to get and set the SNP config.
>>   * Add check to verify that memory reserved for the RMP covers the full system RAM.
>>   * Start the SNP specific commands from 256 instead of 255.
>>   * Multiple cleanup and fixes based on the review feedback.
>>
>> Changes since v2:
>>   * Add AP creation support.
>>   * Drop the patch to handle the RMP fault for the kernel address.
>>   * Add functions to track the write access from the hypervisor.
>>   * Do not enable the SNP feature when IOMMU is disabled or is in passthrough mode.
>>   * Dump the RMP entry on RMP violation for the debug.
>>   * Shorten the GHCB macro names.
>>   * Start the SNP_INIT command id from 255 to give some gap for the legacy SEV.
>>   * Sync the header with the latest 0.9 SNP spec.
>>
>> Changes since v1:
>>   * Add AP reset MSR protocol VMGEXIT NAE.
>>   * Add Hypervisor features VMGEXIT NAE.
>>   * Move the RMP table initialization and RMPUPDATE/PSMASH helper in
>>     arch/x86/kernel/sev.c.
>>   * Add support to map/unmap SEV legacy command buffer to firmware state when
>>     SNP is active.
>>   * Enhance PSP driver to provide helper to allocate/free memory used for the
>>     firmware context page.
>>   * Add support to handle RMP fault for the kernel address.
>>   * Add support to handle GUEST_REQUEST NAE event for attestation.
>>   * Rename RMP table lookup helper.
>>   * Drop typedef from rmpentry struct definition.
>>   * Drop SNP static key and use cpu_feature_enabled() to check whether SEV-SNP
>>     is active.
>>   * Multiple cleanup/fixes to address Boris review feedback.
>>
>> Brijesh Singh (40):
>>    x86/cpufeatures: Add SEV-SNP CPU feature
>>    iommu/amd: Introduce function to check SEV-SNP support
>>    x86/sev: Add the host SEV-SNP initialization support
>>    x86/sev: Add RMP entry lookup helpers
>>    x86/sev: Add helper functions for RMPUPDATE and PSMASH instruction
>>    x86/sev: Invalid pages from direct map when adding it to RMP table
>>    x86/traps: Define RMP violation #PF error code
>>    x86/fault: Add support to handle the RMP fault for user address
>>    x86/fault: Add support to dump RMP entry on fault
>>    crypto: ccp: shutdown SEV firmware on kexec
>>    crypto:ccp: Define the SEV-SNP commands
>>    crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
>>    crypto:ccp: Provide APIs to issue SEV-SNP commands
>>    crypto: ccp: Handle the legacy TMR allocation when SNP is enabled
>>    crypto: ccp: Handle the legacy SEV command when SNP is enabled
>>    crypto: ccp: Add the SNP_PLATFORM_STATUS command
>>    crypto: ccp: Add the SNP_{SET,GET}_EXT_CONFIG command
>>    crypto: ccp: Provide APIs to query extended attestation report
>>    KVM: SVM: Provide the Hypervisor Feature support VMGEXIT
>>    KVM: SVM: Make AVIC backing, VMSA and VMCB memory allocation SNP safe
>>    KVM: SVM: Add initial SEV-SNP support
>>    KVM: SVM: Add KVM_SNP_INIT command
>>    KVM: SVM: Add KVM_SEV_SNP_LAUNCH_START command
>>    KVM: SVM: Add KVM_SEV_SNP_LAUNCH_UPDATE command
>>    KVM: SVM: Mark the private vma unmerable for SEV-SNP guests
>>    KVM: SVM: Add KVM_SEV_SNP_LAUNCH_FINISH command
>>    KVM: X86: Keep the NPT and RMP page level in sync
>>    KVM: x86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
>>    KVM: x86: Define RMP page fault error bits for #NPF
>>    KVM: x86: Update page-fault trace to log full 64-bit error code
>>    KVM: SVM: Do not use long-lived GHCB map while setting scratch area
>>    KVM: SVM: Remove the long-lived GHCB host map
>>    KVM: SVM: Add support to handle GHCB GPA register VMGEXIT
>>    KVM: SVM: Add support to handle MSR based Page State Change VMGEXIT
>>    KVM: SVM: Add support to handle Page State Change VMGEXIT
>>    KVM: SVM: Introduce ops for the post gfn map and unmap
>>    KVM: x86: Export the kvm_zap_gfn_range() for the SNP use
>>    KVM: SVM: Add support to handle the RMP nested page fault
>>    KVM: SVM: Provide support for SNP_GUEST_REQUEST NAE event
>>    KVM: SVM: Add module parameter to enable the SEV-SNP
>>
>> Sean Christopherson (2):
>>    KVM: x86/mmu: Move 'pfn' variable to caller of direct_page_fault()
>>    KVM: x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by TDX and SNP
>>
>> Tom Lendacky (3):
>>    KVM: SVM: Add support to handle AP reset MSR protocol
>>    KVM: SVM: Use a VMSA physical address variable for populating VMCB
>>    KVM: SVM: Support SEV-SNP AP Creation NAE event
>>
>>   Documentation/virt/coco/sevguest.rst          |   55 +
>>   .../virt/kvm/amd-memory-encryption.rst        |  102 +
>>   arch/x86/include/asm/cpufeatures.h            |    1 +
>>   arch/x86/include/asm/disabled-features.h      |    8 +-
>>   arch/x86/include/asm/kvm-x86-ops.h            |    5 +
>>   arch/x86/include/asm/kvm_host.h               |   20 +
>>   arch/x86/include/asm/msr-index.h              |    6 +
>>   arch/x86/include/asm/sev-common.h             |   28 +
>>   arch/x86/include/asm/sev.h                    |   45 +
>>   arch/x86/include/asm/svm.h                    |    7 +
>>   arch/x86/include/asm/trap_pf.h                |   18 +-
>>   arch/x86/kernel/cpu/amd.c                     |    3 +-
>>   arch/x86/kernel/sev.c                         |  361 ++++
>>   arch/x86/kvm/lapic.c                          |    5 +-
>>   arch/x86/kvm/mmu.h                            |    7 +-
>>   arch/x86/kvm/mmu/mmu.c                        |   84 +-
>>   arch/x86/kvm/svm/sev.c                        | 1676 ++++++++++++++++-
>>   arch/x86/kvm/svm/svm.c                        |   62 +-
>>   arch/x86/kvm/svm/svm.h                        |   74 +-
>>   arch/x86/kvm/trace.h                          |   40 +-
>>   arch/x86/kvm/x86.c                            |   92 +-
>>   arch/x86/mm/fault.c                           |   84 +-
>>   drivers/crypto/ccp/sev-dev.c                  |  924 ++++++++-
>>   drivers/crypto/ccp/sev-dev.h                  |   17 +
>>   drivers/crypto/ccp/sp-pci.c                   |   12 +
>>   drivers/iommu/amd/init.c                      |   30 +
>>   include/linux/iommu.h                         |    9 +
>>   include/linux/mm.h                            |    6 +-
>>   include/linux/psp-sev.h                       |  346 ++++
>>   include/linux/sev.h                           |   32 +
>>   include/uapi/linux/kvm.h                      |   56 +
>>   include/uapi/linux/psp-sev.h                  |   60 +
>>   mm/memory.c                                   |   13 +
>>   tools/arch/x86/include/asm/cpufeatures.h      |    1 +
>>   34 files changed, 4088 insertions(+), 201 deletions(-)
>>   create mode 100644 include/linux/sev.h
>>
>> --
>> 2.17.1
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ