lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 20 Oct 2023 13:37:20 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Alexey Kardashevskiy <aik@....com>,
        Sean Christopherson <seanjc@...gle.com>
Cc:     Dionna Amalie Glaze <dionnaglaze@...gle.com>,
        Michael Roth <michael.roth@....com>, kvm@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        linux-crypto@...r.kernel.org, x86@...nel.org,
        linux-kernel@...r.kernel.org, tglx@...utronix.de, mingo@...hat.com,
        jroedel@...e.de, hpa@...or.com, ardb@...nel.org,
        pbonzini@...hat.com, vkuznets@...hat.com, jmattson@...gle.com,
        luto@...nel.org, dave.hansen@...ux.intel.com, slp@...hat.com,
        pgonda@...gle.com, peterz@...radead.org,
        srinivas.pandruvada@...ux.intel.com, rientjes@...gle.com,
        dovmurik@...ux.ibm.com, tobin@....com, bp@...en8.de,
        vbabka@...e.cz, kirill@...temov.name, ak@...ux.intel.com,
        tony.luck@...el.com, marcorr@...gle.com,
        sathyanarayanan.kuppuswamy@...ux.intel.com, alpergun@...gle.com,
        jarkko@...nel.org, ashish.kalra@....com, nikunj.dadhania@....com,
        pankaj.gupta@....com, liam.merwick@...cle.com,
        zhi.a.wang@...el.com, Brijesh Singh <brijesh.singh@....com>
Subject: Re: [PATCH v10 48/50] KVM: SEV: Provide support for SNP_GUEST_REQUEST
 NAE event

On 10/18/23 21:48, Alexey Kardashevskiy wrote:
> 
> On 19/10/23 00:48, Sean Christopherson wrote:
>> On Wed, Oct 18, 2023, Alexey Kardashevskiy wrote:
>>>
>>> On 18/10/23 03:27, Sean Christopherson wrote:
>>>> On Mon, Oct 16, 2023, Dionna Amalie Glaze wrote:
>>>>>> +
>>>>>> +       /*
>>>>>> +        * If a VMM-specific certificate blob hasn't been provided, 
>>>>>> grab the
>>>>>> +        * host-wide one.
>>>>>> +        */
>>>>>> +       snp_certs = sev_snp_certs_get(sev->snp_certs);
>>>>>> +       if (!snp_certs)
>>>>>> +               snp_certs = sev_snp_global_certs_get();
>>>>>> +
>>>>>
>>>>> This is where the generation I suggested adding would get checked. If
>>>>> the instance certs' generation is not the global generation, then I
>>>>> think we need a way to return to the VMM to make that right before
>>>>> continuing to provide outdated certificates.
>>>>> This might be an unreasonable request, but the fact that the certs and
>>>>> reported_tcb can be set while a VM is running makes this an issue.
>>>>
>>>> Before we get that far, the changelogs need to explain why the kernel 
>>>> is storing
>>>> userspace blobs in the first place.  The whole thing is a bit of a mess.
>>>>
>>>> sev_snp_global_certs_get() has data races that could lead to 
>>>> variations of TOCTOU
>>>> bugs: sev_ioctl_snp_set_config() can overwrite 
>>>> psp_master->sev_data->snp_certs
>>>> while sev_snp_global_certs_get() is running.  If the compiler reloads 
>>>> snp_certs
>>>> between bumping the refcount and grabbing the pointer, KVM will end up 
>>>> leaking a
>>>> refcount and consuming a pointer without a refcount.
>>>>
>>>>     if (!kref_get_unless_zero(&certs->kref))
>>>>         return NULL;
>>>>
>>>>     return certs;
>>>
>>> I'm missing something here. The @certs pointer is on the stack,
>>
>> No, nothing guarantees that @certs is on the stack and will never be 
>> reloaded.
>> sev_snp_certs_get() is in full view of sev_snp_global_certs_get(), so 
>> it's entirely
>> possible that it can be inlined.  Then you end up with:
>>
>>     struct sev_device *sev;
>>
>>     if (!psp_master || !psp_master->sev_data)
>>         return NULL;
>>
>>     sev = psp_master->sev_data;
>>     if (!sev->snp_initialized)
>>         return NULL;
>>
>>     if (!sev->snp_certs)
>>         return NULL;
>>
>>     if (!kref_get_unless_zero(&sev->snp_certs->kref))
>>         return NULL;
>>
>>     return sev->snp_certs;
>>
>> At which point the compiler could choose to omit a local variable 
>> entirely, it
>> could store @certs in a register and reload after 
>> kref_get_unless_zero(), etc.
>> If psp_master->sev_data->snp_certs is changed at any point, odd thing 
>> can happen.
>>
>> That atomic operation in kref_get_unless_zero() might prevent a reload 
>> between
>> getting the kref and the return, but it wouldn't prevent a reload 
>> between the
>> !NULL check and kref_get_unless_zero().
> 
> Oh. The function is exported so I thought gcc would not go that far but 
> yeah it is possible. So this needs an explicit READ_ONCE barrier.
> 
> 
>>>> If userspace wants to provide garbage to the guest, so be it, not 
>>>> KVM's problem.
>>>> That way, whether the VM gets the global cert or a per-VM cert is 
>>>> purely a userspace
>>>> concern.
>>>
>>> The global cert lives in CCP (/dev/sev), the per VM cert lives in 
>>> kvmvm_fd.
>>> "A la vcpu->run" is fine for the latter but for the former we need 
>>> something
>>> else.
>>
>> Why?  The cert ultimately comes from userspace, no?  Make userspace deal 
>> with it.
>>
>>> And there is scenario when one global certs blob is what is needed and
>>> copying it over multiple VMs seems suboptimal.
>>
>> That's a solvable problem.  I'm not sure I like the most obvious 
>> solution, but it
>> is a solution: let userspace define a KVM-wide blob pointer, either via 
>> .mmap()
>> or via an ioctl().
>>
>> FWIW, there's no need to do .mmap() shenanigans, e.g. an ioctl() to set the
>> userspace pointer would suffice.  The benefit of a kernel controlled 
>> pointer is
>> that it doesn't require copying to a kernel buffer (or special code to 
>> copy from
>> userspace into guest).
> 
> Just to clarify - like, a small userspace non-qemu program which just 
> holds a pointer with the certs blob, or embed it into libvirt or systemd?
> 
> 
>> Actually, looking at the flow again, AFAICT there's nothing special 
>> about the
>> target DATA_PAGE.  It must be SHARED *before* 
>> SVM_VMGEXIT_EXT_GUEST_REQUEST, i.e.
>> KVM doesn't need to do conversions, there's no kernel priveleges 
>> required, etc.
>> And the GHCB doesn't dictate ordering between storing the certificates 
>> and doing
>> the request.  That means the certificate stuff can be punted entirely to 
>> usersepace.
> 
> All true.
> 
>> Heh, typing up the below, there's another bug: KVM will incorrectly 
>> "return" '0'
>> for non-SNP guests:
>>
>>     unsigned long exitcode = 0;
>>     u64 data_gpa;
>>     int err, rc;
>>
>>     if (!sev_snp_guest(vcpu->kvm)) {
>>         rc = SEV_RET_INVALID_GUEST; <= sets "rc", not "exitcode"
>>         goto e_fail;
>>     }
>>
>> e_fail:
>>     ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, exitcode);
>>
>> Which really highlights that we need to get test infrastructure up and 
>> running
>> for SEV-ES, SNP, and TDX.
>>
>> Anyways, back to punting to userspace.  Here's a rough sketch.  The only 
>> new uAPI
>> is the definition of KVM_HC_SNP_GET_CERTS and its arguments.
>>
>> static void snp_handle_guest_request(struct vcpu_svm *svm)
>> {
>>     struct vmcb_control_area *control = &svm->vmcb->control;
>>     struct sev_data_snp_guest_request data = {0};
>>     struct kvm_vcpu *vcpu = &svm->vcpu;
>>     struct kvm *kvm = vcpu->kvm;
>>     struct kvm_sev_info *sev;
>>     gpa_t req_gpa = control->exit_info_1;
>>     gpa_t resp_gpa = control->exit_info_2;
>>     unsigned long rc;
>>     int err;
>>
>>     if (!sev_snp_guest(vcpu->kvm)) {
>>         rc = SEV_RET_INVALID_GUEST;
>>         goto e_fail;
>>     }
>>
>>     sev = &to_kvm_svm(kvm)->sev_info;
>>
>>     mutex_lock(&sev->guest_req_lock);
>>
>>     rc = snp_setup_guest_buf(svm, &data, req_gpa, resp_gpa);
>>     if (rc)
>>         goto unlock;
>>
>>     rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
>>     if (rc)
>>         /* Ensure an error value is returned to guest. */
>>         rc = err ? err : SEV_RET_INVALID_ADDRESS;
>>
>>     snp_cleanup_guest_buf(&data, &rc);
>>
>> unlock:
>>     mutex_unlock(&sev->guest_req_lock);
>>
>> e_fail:
>>     ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, rc);
>> }
>>
>> static int snp_complete_ext_guest_request(struct kvm_vcpu *vcpu)
>> {
>>     u64 certs_exitcode = vcpu->run->hypercall.args[2];
>>     struct vcpu_svm *svm = to_svm(vcpu);
>>
>>     if (certs_exitcode)
>>         ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, certs_exitcode);
>>     else
>>         snp_handle_guest_request(svm);
>>     return 1;
>> }
>>
>> static int snp_handle_ext_guest_request(struct vcpu_svm *svm)
>> {
>>     struct kvm_vcpu *vcpu = &svm->vcpu;
>>     struct kvm *kvm = vcpu->kvm;
>>     struct kvm_sev_info *sev;
>>     unsigned long exitcode;
>>     u64 data_gpa;
>>
>>     if (!sev_snp_guest(vcpu->kvm)) {
>>         ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_GUEST);
>>         return 1;
>>     }
>>
>>     data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
>>     if (!IS_ALIGNED(data_gpa, PAGE_SIZE)) {
>>         ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SEV_RET_INVALID_ADDRESS);
>>         return 1;
>>     }
>>
>>     vcpu->run->hypercall.nr         = KVM_HC_SNP_GET_CERTS;
>>     vcpu->run->hypercall.args[0]     = data_gpa;
>>     vcpu->run->hypercall.args[1]     = vcpu->arch.regs[VCPU_REGS_RBX];
>>     vcpu->run->hypercall.flags     = KVM_EXIT_HYPERCALL_LONG_MODE;
> 
> btw why is it _LONG_MODE and not just _64? :)
> 
>>     vcpu->arch.complete_userspace_io = snp_complete_ext_guest_request;
>>     return 0;
>> }
> 
> This should work the KVM stored certs nicely but not for the global certs. 
> Although I am not all convinced that global certs is all that valuable but 
> I do not know the history of that, happened before I joined so I let 
> others to comment on that. Thanks,

Global certs was the original implementation because it was intended to 
provide the VCEK, ASK, and ARK. These will be the same for all SNP guests 
that are launched. The original intention was also to not make the kernel 
have to manage multiple certificates and instead just treat the data as a 
blob provided from user-space.

The per-VM change was added to allow a per-VM certificates. If a provider 
has no need to use this, then only the global certs blob is needed which 
reduces the amount of memory needed for the VM.

Thanks,
Tom

> 
> 

Powered by blists - more mailing lists