linux-kernel - Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZkvddEe3lnAlYQbQ@google.com>
Date: Mon, 20 May 2024 16:32:04 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Michael Roth <michael.roth@....com>
Cc: Michael Roth <mdroth@...xas.edu>, pbonzini@...hat.com, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, ashish.kalra@....com, thomas.lendacky@....com, 
	rick.p.edgecombe@...el.com
Subject: Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

On Mon, May 20, 2024, Michael Roth wrote:
> On Mon, May 20, 2024 at 07:17:13AM -0700, Sean Christopherson wrote:
> > On Sat, May 18, 2024, Michael Roth wrote:
> > > Before forwarding guest requests to firmware, KVM takes a reference on
> > > the 2 pages the guest uses for its request/response buffers. Make sure
> > > to release these when cleaning up after the request is completed.
> > > 
> > > Signed-off-by: Michael Roth <michael.roth@....com>
> > > ---
> > 
> > ...
> > 
> > > @@ -3970,14 +3980,11 @@ static int __snp_handle_guest_req(struct kvm *kvm, gpa_t req_gpa, gpa_t resp_gpa
> > >  		return ret;
> > >  
> > >  	ret = sev_issue_cmd(kvm, SEV_CMD_SNP_GUEST_REQUEST, &data, fw_err);
> > > -	if (ret)
> > > -		return ret;
> > >  
> > > -	ret = snp_cleanup_guest_buf(&data);
> > > -	if (ret)
> > > -		return ret;
> > > +	if (snp_cleanup_guest_buf(&data))
> > > +		return -EINVAL;
> > 
> > EINVAL feels wrong.  The input was completely valid.  Also, forwarding the error
> 
> Yah, EIO seems more suitable here.
> 
> > to the guest doesn't seem like the right thing to do if KVM can't reclaim the
> > response PFN.  Shouldn't that be fatal to the VM?
> 
> The thinking here is that pretty much all guest request failures will be
> fatal to the guest being able to continue. At least, that's definitely
> true for attestation. So reporting the error to the guest would allow that
> failure to be propagated along by handling in the guest where it would
> presumably be reported a little more clearly to the guest owner, at
> which point the guest would most likely terminate itself anyway.

But failure to convert a pfn back to shared is a _host_ issue, not a guest issue.
E.g. it most likely indicates a bug in the host software stack, or perhaps a bad
CPU or firmware bug.



> But there is a possibility that the guest will attempt access the response
> PFN before/during that reporting and spin on an #NPF instead though. So
> maybe the safer more repeatable approach is to handle the error directly
> from KVM and propagate it to userspace.

I was thinking more along the lines of KVM marking the VM as dead/bugged.  

> But the GHCB spec does require that the firmware response code for
> SNP_GUEST_REQUEST be passed directly to the guest via lower 32-bits of
> SW_EXITINFO2, so we'd still want handling to pass that error on to the
> guest, so I made some changes to retain that behavior.

If and only the hypervisor completes the event.

  The hypervisor must save the SNP_GUEST_REQUEST return code in the lower 32-bits
  of the SW_EXITINFO2 field before completing the Guest Request NAE event.

If KVM terminates the VM, there's obviously no need to fill SW_EXITINFO2.

Side topic, is there a plan to ratelimit Guest Requests?

  To avoid the possibility of a guest creating a denial of service attack against
  the SNP firmware, it is recommended that some form of rate limiting be implemented
  should it be detected that a high number of Guest Request NAE events are being
  issued.