lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 24 Feb 2023 13:37:48 +0100
From:   Alexander Graf <graf@...zon.com>
To:     Michael Roth <michael.roth@....com>, <kvm@...r.kernel.org>
CC:     <linux-coco@...ts.linux.dev>, <linux-mm@...ck.org>,
        <linux-crypto@...r.kernel.org>, <x86@...nel.org>,
        <linux-kernel@...r.kernel.org>, <tglx@...utronix.de>,
        <mingo@...hat.com>, <jroedel@...e.de>, <thomas.lendacky@....com>,
        <hpa@...or.com>, <ardb@...nel.org>, <pbonzini@...hat.com>,
        <seanjc@...gle.com>, <vkuznets@...hat.com>, <jmattson@...gle.com>,
        <luto@...nel.org>, <dave.hansen@...ux.intel.com>, <slp@...hat.com>,
        <pgonda@...gle.com>, <peterz@...radead.org>,
        <srinivas.pandruvada@...ux.intel.com>, <rientjes@...gle.com>,
        <dovmurik@...ux.ibm.com>, <tobin@....com>, <bp@...en8.de>,
        <vbabka@...e.cz>, <kirill@...temov.name>, <ak@...ux.intel.com>,
        <tony.luck@...el.com>, <marcorr@...gle.com>,
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        <alpergun@...gle.com>, <dgilbert@...hat.com>, <jarkko@...nel.org>,
        <ashish.kalra@....com>, <nikunj.dadhania@....com>,
        Brijesh Singh <brijesh.singh@....com>
Subject: Re: [PATCH RFC v8 47/56] KVM: SVM: Support SEV-SNP AP Creation NAE
 event


On 20.02.23 19:38, Michael Roth wrote:
> From: Tom Lendacky <thomas.lendacky@....com>
>
> Add support for the SEV-SNP AP Creation NAE event. This allows SEV-SNP
> guests to alter the register state of the APs on their own. This allows
> the guest a way of simulating INIT-SIPI.
>
> A new event, KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, is created and used
> so as to avoid updating the VMSA pointer while the vCPU is running.
>
> For CREATE
>    The guest supplies the GPA of the VMSA to be used for the vCPU with
>    the specified APIC ID. The GPA is saved in the svm struct of the
>    target vCPU, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is added
>    to the vCPU and then the vCPU is kicked.
>
> For CREATE_ON_INIT:
>    The guest supplies the GPA of the VMSA to be used for the vCPU with
>    the specified APIC ID the next time an INIT is performed. The GPA is
>    saved in the svm struct of the target vCPU.
>
> For DESTROY:
>    The guest indicates it wishes to stop the vCPU. The GPA is cleared
>    from the svm struct, the KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event is
>    added to vCPU and then the vCPU is kicked.
>
> The KVM_REQ_UPDATE_PROTECTED_GUEST_STATE event handler will be invoked
> as a result of the event or as a result of an INIT. The handler sets the
> vCPU to the KVM_MP_STATE_UNINITIALIZED state, so that any errors will
> leave the vCPU as not runnable. Any previous VMSA pages that were
> installed as part of an SEV-SNP AP Creation NAE event are un-pinned. If
> a new VMSA is to be installed, the VMSA guest page is pinned and set as
> the VMSA in the vCPU VMCB and the vCPU state is set to
> KVM_MP_STATE_RUNNABLE. If a new VMSA is not to be installed, the VMSA is
> cleared in the vCPU VMCB and the vCPU state is left as
> KVM_MP_STATE_UNINITIALIZED to prevent it from being run.
>
> Signed-off-by: Tom Lendacky <thomas.lendacky@....com>
> Signed-off-by: Brijesh Singh <brijesh.singh@....com>
> Signed-off-by: Ashish Kalra <ashish.kalra@....com>
> [mdr: add handling for restrictedmem]
> Signed-off-by: Michael Roth <michael.roth@....com>


What is the intended boot sequence for SEV-SNP guests? FWIW with this 
interface in place, guests will typically use in-guest VMSA pages to 
hold secondary vcpu state. But that means we're now allocating 4kb of 
memory for every vcpu that we create that will be for most of the 
guest's lifetime superfluous.

Wouldn't it make more sense to have a model where we only allocate the 
VMSA for the boot CPU and leave secondary allocation to the guest? We 
already need firmware changes for SEV-SNP - may as well make this one more.

[...]

> +
> +static int sev_snp_ap_creation(struct vcpu_svm *svm)
> +{
> +       struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> +       struct kvm_vcpu *target_vcpu;
> +       struct vcpu_svm *target_svm;
> +       unsigned int request;
> +       unsigned int apic_id;
> +       bool kick;
> +       int ret;
> +
> +       request = lower_32_bits(svm->vmcb->control.exit_info_1);
> +       apic_id = upper_32_bits(svm->vmcb->control.exit_info_1);
> +
> +       /* Validate the APIC ID */
> +       target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, apic_id);


Out of curiosity: The target CPU can be my own vCPU, right?


> +       if (!target_vcpu) {
> +               vcpu_unimpl(vcpu, "vmgexit: invalid AP APIC ID [%#x] from guest\n",
> +                           apic_id);
> +               return -EINVAL;
> +       }
> +
> +       ret = 0;
> +
> +       target_svm = to_svm(target_vcpu);
> +
> +       /*
> +        * The target vCPU is valid, so the vCPU will be kicked unless the
> +        * request is for CREATE_ON_INIT. For any errors at this stage, the
> +        * kick will place the vCPU in an non-runnable state.
> +        */
> +       kick = true;
> +
> +       mutex_lock(&target_svm->sev_es.snp_vmsa_mutex);
> +
> +       target_svm->sev_es.snp_vmsa_gpa = INVALID_PAGE;
> +       target_svm->sev_es.snp_ap_create = true;
> +
> +       /* Interrupt injection mode shouldn't change for AP creation */
> +       if (request < SVM_VMGEXIT_AP_DESTROY) {
> +               u64 sev_features;
> +
> +               sev_features = vcpu->arch.regs[VCPU_REGS_RAX];
> +               sev_features ^= sev->sev_features;
> +               if (sev_features & SVM_SEV_FEAT_INT_INJ_MODES) {
> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP injection mode [%#lx] from guest\n",
> +                                   vcpu->arch.regs[VCPU_REGS_RAX]);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +       }
> +
> +       switch (request) {
> +       case SVM_VMGEXIT_AP_CREATE_ON_INIT:
> +               kick = false;
> +               fallthrough;
> +       case SVM_VMGEXIT_AP_CREATE:
> +               if (!page_address_valid(vcpu, svm->vmcb->control.exit_info_2)) {
> +                       vcpu_unimpl(vcpu, "vmgexit: invalid AP VMSA address [%#llx] from guest\n",
> +                                   svm->vmcb->control.exit_info_2);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               /*
> +                * Malicious guest can RMPADJUST a large page into VMSA which
> +                * will hit the SNP erratum where the CPU will incorrectly signal
> +                * an RMP violation #PF if a hugepage collides with the RMP entry
> +                * of VMSA page, reject the AP CREATE request if VMSA address from
> +                * guest is 2M aligned.


This will break genuine current Linux kernels that just happen to 
allocate a guest page, no? In fact, given enough vCPUs you're almost 
guaranteed to hit an aligned structure somewhere. What is the guest 
supposed to do in that situation?


> +                */
> +               if (IS_ALIGNED(svm->vmcb->control.exit_info_2, PMD_SIZE)) {
> +                       vcpu_unimpl(vcpu,
> +                                   "vmgexit: AP VMSA address [%llx] from guest is unsafe as it is 2M aligned\n",
> +                                   svm->vmcb->control.exit_info_2);
> +                       ret = -EINVAL;
> +                       goto out;
> +               }
> +
> +               target_svm->sev_es.snp_vmsa_gpa = svm->vmcb->control.exit_info_2;
> +               break;
> +       case SVM_VMGEXIT_AP_DESTROY:


I don't understand the destroy path. Why does this case destroy anything?


> +               break;
> +       default:
> +               vcpu_unimpl(vcpu, "vmgexit: invalid AP creation request [%#x] from guest\n",
> +                           request);
> +               ret = -EINVAL;
> +               break;
> +       }
> +
> +out:
> +       if (kick) {
> +               if (target_vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)
> +                       target_vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;


What if the guest AP goes through a create -> destroy -> create cycle? 
Will it stay runnable while destroyed?


Alex

> +
> +               kvm_make_request(KVM_REQ_UPDATE_PROTECTED_GUEST_STATE, target_vcpu);
> +               kvm_vcpu_kick(target_vcpu);
> +       }
> +
> +       mutex_unlock(&target_svm->sev_es.snp_vmsa_mutex);
> +
> +       return ret;
> +}
> +
>   static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
>   {
>          struct vmcb_control_area *control = &svm->vmcb->control;



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ