linux-kernel - Re: [PATCH v1 1/5] KVM: SEV: Provide support for SNP_GUEST

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6sczq2nmoefcociyffssdtoav2zjtuenzmhybgdtqyyvk5zps6@nnkw2u74j7pu>
Date: Wed, 26 Jun 2024 10:45:31 -0500
From: Michael Roth <michael.roth@....com>
To: Sean Christopherson <seanjc@...gle.com>
CC: <kvm@...r.kernel.org>, <linux-coco@...ts.linux.dev>,
	<linux-kernel@...r.kernel.org>, <x86@...nel.org>, <pbonzini@...hat.com>,
	<jroedel@...e.de>, <thomas.lendacky@....com>, <pgonda@...gle.com>,
	<ashish.kalra@....com>, <bp@...en8.de>, <pankaj.gupta@....com>,
	<liam.merwick@...cle.com>, Brijesh Singh <brijesh.singh@....com>, "Alexey
 Kardashevskiy" <aik@....com>
Subject: Re: [PATCH v1 1/5] KVM: SEV: Provide support for SNP_GUEST_REQUEST
 NAE event

On Wed, Jun 26, 2024 at 06:58:09AM -0700, Sean Christopherson wrote:
> On Fri, Jun 21, 2024, Michael Roth wrote:
> > @@ -3939,6 +3944,67 @@ static int sev_snp_ap_creation(struct vcpu_svm *svm)
> >  	return ret;
> >  }
> >  
> > +static int snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp_gpa)
> > +{
> > +	struct sev_data_snp_guest_request data = {0};
> > +	struct kvm *kvm = svm->vcpu.kvm;
> > +	kvm_pfn_t req_pfn, resp_pfn;
> > +	sev_ret_code fw_err = 0;
> > +	int ret;
> > +
> > +	if (!sev_snp_guest(kvm) || !PAGE_ALIGNED(req_gpa) || !PAGE_ALIGNED(resp_gpa))
> > +		return -EINVAL;
> > +
> > +	req_pfn = gfn_to_pfn(kvm, gpa_to_gfn(req_gpa));
> 
> This is going to sound odd, but I think we should use gfn_to_page(), i.e. require
> that the page be a refcounted page so that it can be pinned.  Long story short,
> one of my learnings from the whole non-refcounted pages saga is that doing DMA
> to unpinned pages outside of mmu_lock is wildly unsafe, and for all intents and
> purposes the ASP is a device doing DMA.  I'm also pretty sure KVM should actually
> *pin* pages, as in FOLL_PIN, but I'm ok tackling that later.
> 
> For now, using gfn_to_pages() would avoid creating ABI (DMA to VM_PFNMAP and/or
> VM_MIXEDMAP memory) that KVM probably doesn't want to support in the long term.

That seems reasonable. Will switch this to gfn_to_page().

> 
> [*] https://lore.kernel.org/all/20240229025759.1187910-1-stevensd@google.com
> 
> > +	if (is_error_noslot_pfn(req_pfn))
> > +		return -EINVAL;
> > +
> > +	resp_pfn = gfn_to_pfn(kvm, gpa_to_gfn(resp_gpa));
> > +	if (is_error_noslot_pfn(resp_pfn)) {
> > +		ret = EINVAL;
> > +		goto release_req;
> > +	}
> > +
> > +	if (rmp_make_private(resp_pfn, 0, PG_LEVEL_4K, 0, true)) {
> > +		ret = -EINVAL;
> > +		kvm_release_pfn_clean(resp_pfn);
> > +		goto release_req;
> > +	}
> 
> I don't see how this is safe.  KVM holds no locks, i.e. can't guarantee that the
> resp_pfn stays private for the duration of the operation.  And on the opposite

When the page is set to private with asid=0,immutable=true arguments,
this puts the page in a special 'firmware-owned' state that specifically
to avoid any changes to the page state happening from under the ASPs feet.
The only way to switch the page to any other state at this point is to
issue the SEV_CMD_SNP_PAGE_RECLAIM request to the ASP via
snp_page_reclaim().

I could see the guest shooting itself in the foot by issuing 2 guest
requests with the same req_pfn/resp_pfn, but on the KVM side whichever
request issues rmp_make_private() first would succeed, and then the
2nd request would generate an EINVAL to userspace.

In that sense, rmp_make_private()/snp_page_reclaim() sort of pair to
lock/unlock a page that's being handed to the ASP. But this should be
better documented either way.

> resp_pfn stays private for the duration of the operation.  And on the opposite
> side, KVM can't guarantee that resp_pfn isn't being actively used by something
> in the kernel, e.g. KVM might induce an unexpected #PF(RMP).
> 
> Why can't KVM require that the response/destination page already be private?  I'm

Hmm. This is a bit tricky. The GHCB spec states:

  The Guest Request NAE event requires two unique pages, one page for the
  request and one page for the response. Both pages must be assigned to the
  hypervisor (shared). The guest must supply the guest physical address of
  the pages (i.e., page aligned) as input.

  The hypervisor must translate the guest physical address (GPA) of each
  page into a system physical address (SPA). The SPA is used to verify that
  the request and response pages are assigned to the hypervisor.

At least for req_pfn, this makes sense because the header of the message
is actually plain text, and KVM does need to parse it to read the message
type from the header. It's just the req/resp payload that are encrypted
by the guest/firmware, i.e. it's not relying on hardware-based memory
encryption in this case.

Because of that though, I think we could potential address this by
allocating the req/resp page as separate pages outside of guest memory,
and simply copy the contents to/from the GPAs provided by the guest.
I'll look more into this approach.

If we go that route I think some of the concerns above go away as well,
but we might still need to a lock or something to serialize access to
these separate/intermediate pages to avoid needed to allocate them every
time or per-request.

> also somewhat confused by the reclaim below.  If KVM converts the response page
> back to shared, doesn't that clobber the data?  If so, how does the guest actually
> get the response?  I feel like I'm missing something.

In this case we're just setting it immutable/firmware-owned, which just
happens to be equivalent (in terms of the RMP table) to a guest-owned page,
but with rmp_entry.ASID=0/rmp_entry.immutable=true instead of
rmp_entry.ASID=<guest asid>/rmp_entry.immutable=false. So the contents remain
intact/readable after the reclaim.

> 
> Regardless of whether or not I'm missing something, this needs comments, and the
> changelog needs to be rewritten with --verbose to explain what's going on.  

Sure, will add more comments and details in the commit msg for v2.

> 
> > +	data.gctx_paddr = __psp_pa(to_kvm_sev_info(kvm)->snp_context);
> > +	data.req_paddr = __sme_set(req_pfn << PAGE_SHIFT);
> > +	data.res_paddr = __sme_set(resp_pfn << PAGE_SHIFT);
> > +
> > +	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, &fw_err);
> > +	if (ret)
> > +		return ret;
> 
> This leaks both req_pfn and resp_pfn.

Yes, this one should be fixed up by the v1-revised version of the patch.

Thanks!

-Mike

> 
> > +
> > +	/*
> > +	 * If reclaim fails then there's a good chance the guest will no longer
> > +	 * be runnable so just let userspace terminate the guest.
> > +	 */
> > +	if (snp_page_reclaim(kvm, resp_pfn)) {
> > +		return -EIO;
> > +		goto release_req;
> > +	}
> > +
> > +	/*
> > +	 * As per GHCB spec, firmware failures should be communicated back to
> > +	 * the guest via SW_EXITINFO2 rather than be treated as immediately
> > +	 * fatal.
> > +	 */
> > +	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb,
> > +				SNP_GUEST_ERR(ret ? SNP_GUEST_VMM_ERR_GENERIC : 0,
> > +					      fw_err));
> > +
> > +	ret = 1; /* resume guest */
> > +	kvm_release_pfn_dirty(resp_pfn);
> > +
> > +release_req:
> > +	kvm_release_pfn_clean(req_pfn);
> > +	return ret;
> > +}