linux-kernel - Re: [PATCH Part2 RFC v4 38/40] KVM: SVM: Provide support for SNP_GUEST

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YPYBmlCuERUIO5+M@google.com>
Date:   Mon, 19 Jul 2021 22:50:02 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Brijesh Singh <brijesh.singh@....com>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-efi@...r.kernel.org, platform-driver-x86@...r.kernel.org,
        linux-coco@...ts.linux.dev, linux-mm@...ck.org,
        linux-crypto@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Joerg Roedel <jroedel@...e.de>,
        Tom Lendacky <thomas.lendacky@....com>,
        "H. Peter Anvin" <hpa@...or.com>, Ard Biesheuvel <ardb@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Sergio Lopez <slp@...hat.com>, Peter Gonda <pgonda@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Dov Murik <dovmurik@...ux.ibm.com>,
        Tobin Feldman-Fitzthum <tobin@....com>,
        Borislav Petkov <bp@...en8.de>,
        Michael Roth <michael.roth@....com>,
        Vlastimil Babka <vbabka@...e.cz>, tony.luck@...el.com,
        npmccallum@...hat.com, brijesh.ksingh@...il.com
Subject: Re: [PATCH Part2 RFC v4 38/40] KVM: SVM: Provide support for
 SNP_GUEST_REQUEST NAE event

On Wed, Jul 07, 2021, Brijesh Singh wrote:
> Version 2 of GHCB specification added the support two SNP Guest Request
> Message NAE event. The events allows for an SEV-SNP guest to make request
> to the SEV-SNP firmware through hypervisor using the SNP_GUEST_REQUEST
> API define in the SEV-SNP firmware specification.

IIUC, this snippet in the spec means KVM can't restrict what requests are made
by the guests.  If so, that makes it difficult to detect/ratelimit a misbehaving
guest, and also limits our options if there are firmware issues (hopefully there
aren't).  E.g. ratelimiting a guest after KVM has explicitly requested it to
migrate is not exactly desirable.

  The hypervisor cannot alter the messages without detection nor read the
  plaintext of the messages.

> The SNP_GUEST_REQUEST requires two unique pages, one page for the request
> and one page for the response. The response page need to be in the firmware
> state. The GHCB specification says that both the pages need to be in the
> hypervisor state but before executing the SEV-SNP command the response page
> need to be in the firmware state.
 
...

> Now that KVM supports all the VMGEXIT NAEs required for the base SEV-SNP
> feature, set the hypervisor feature to advertise it.

It would helpful if this changelog listed the Guest Requests that are required
for "base" SNP, e.g. to provide some insight as to why we care about guest
requests.

>  static int snp_bind_asid(struct kvm *kvm, int *error)
> @@ -1618,6 +1631,12 @@ static int snp_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  	if (rc)
>  		goto e_free_context;
>  
> +	/* Used for rate limiting SNP guest message request, use the default settings */
> +	ratelimit_default_init(&sev->snp_guest_msg_rs);

Is this exposed to userspace in any way?  This feels very much like a knob that
needs to be configurable per-VM.

Also, what are the estimated latencies of a guest request?  If the worst case
latency is >200ms, a default ratelimit frequency of 5hz isn't going to do a whole
lot.

> +static void snp_handle_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
> +				     gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +	struct sev_data_snp_guest_request data = {};
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_sev_info *sev;
> +	int rc, err = 0;
> +
> +	if (!sev_snp_guest(vcpu->kvm)) {
> +		rc = -ENODEV;
> +		goto e_fail;
> +	}
> +
> +	sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	if (!__ratelimit(&sev->snp_guest_msg_rs)) {
> +		pr_info_ratelimited("svm: too many guest message requests\n");
> +		rc = -EAGAIN;

What guarantee do we have that the guest actually understands -EAGAIN?  Ditto
for -EINVAL returned by snp_build_guest_buf().  AFAICT, our options are to return
one of the error codes defined in "Table 95. Status Codes for SNP_GUEST_REQUEST"
of the firmware ABI, kill the guest, or ratelimit the guest without returning
control to the guest.

> +		goto e_fail;
> +	}
> +
> +	rc = snp_build_guest_buf(svm, &data, req_gpa, resp_gpa);
> +	if (rc)
> +		goto e_fail;
> +
> +	sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	mutex_lock(&kvm->lock);

Question on the VMPCK sequences.  The firmware ABI says:

   Each guest has four VMPCKs ... Each message contains a sequence number per
   VMPCK. The sequence number is incremented with each message sent. Messages
   sent by the guest to the firmware and by the firmware to the guest must be
   delivered in order. If not, the firmware will reject subsequent messages ...

Does that mean there are four independent sequences, i.e. four streams the guest
can use "concurrently", or does it mean the overall freshess/integrity check is
composed from four VMPCK sequences, all of which must be correct for the message
to be valid?

If it's the latter, then a traditional mutex isn't really necessary because the
guest must implement its own serialization, e.g. it's own mutex or whatever, to
ensure there is at most one request in-flight at any given time.  And on the KVM
side it means KVM can simpy reject requests if there is already an in-flight
request.  It might also give us more/better options for ratelimiting?

> +	rc = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &err);
> +	if (rc) {
> +		mutex_unlock(&kvm->lock);

I suspect you reused this pattern from other, more complex code, but here it's
overkill.  E.g.

	if (!rc)
		rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
	else if (err)
		rc = err;

	mutex_unlock(&kvm->lock);

	ghcb_set_sw_exit_info_2(ghcb, rc);

> +		/* If we have a firmware error code then use it. */
> +		if (err)
> +			rc = err;
> +
> +		goto e_fail;
> +	}
> +
> +	/* Copy the response after the firmware returns success. */
> +	rc = kvm_write_guest(kvm, resp_gpa, sev->snp_resp_page, PAGE_SIZE);
> +
> +	mutex_unlock(&kvm->lock);
> +
> +e_fail:
> +	ghcb_set_sw_exit_info_2(ghcb, rc);
> +}
> +
> +static void snp_handle_ext_guest_request(struct vcpu_svm *svm, struct ghcb *ghcb,
> +					 gpa_t req_gpa, gpa_t resp_gpa)
> +{
> +	struct sev_data_snp_guest_request req = {};
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct kvm *kvm = vcpu->kvm;
> +	unsigned long data_npages;
> +	struct kvm_sev_info *sev;
> +	unsigned long err;
> +	u64 data_gpa;
> +	int rc;
> +
> +	if (!sev_snp_guest(vcpu->kvm)) {
> +		rc = -ENODEV;
> +		goto e_fail;
> +	}
> +
> +	sev = &to_kvm_svm(kvm)->sev_info;
> +
> +	if (!__ratelimit(&sev->snp_guest_msg_rs)) {
> +		pr_info_ratelimited("svm: too many guest message requests\n");
> +		rc = -EAGAIN;
> +		goto e_fail;
> +	}
> +
> +	if (!sev->snp_certs_data) {
> +		pr_err("svm: certs data memory is not allocated\n");
> +		rc = -EFAULT;

Another instance where the kernel's error numbers will not suffice.

> +		goto e_fail;
> +	}
> +
> +	data_gpa = ghcb_get_rax(ghcb);
> +	data_npages = ghcb_get_rbx(ghcb);