linux-kernel - Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest requests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <qgzgdh7fqynpvu6gh6kox5rnixswtu2ewl3hiutohpndmbdo6x@kfwegt625uqh>
Date: Tue, 21 May 2024 10:34:13 -0500
From: Michael Roth <michael.roth@....com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Michael Roth <mdroth@...xas.edu>, <pbonzini@...hat.com>,
	<kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<ashish.kalra@....com>, <thomas.lendacky@....com>,
	<rick.p.edgecombe@...el.com>
Subject: Re: [PATCH] KVM: SEV: Fix guest memory leak when handling guest
 requests

On Tue, May 21, 2024 at 07:09:04AM -0700, Sean Christopherson wrote:
> On Mon, May 20, 2024, Michael Roth wrote:
> > On Mon, May 20, 2024 at 04:32:04PM -0700, Sean Christopherson wrote:
> > > On Mon, May 20, 2024, Michael Roth wrote:
> > > > But there is a possibility that the guest will attempt access the response
> > > > PFN before/during that reporting and spin on an #NPF instead though. So
> > > > maybe the safer more repeatable approach is to handle the error directly
> > > > from KVM and propagate it to userspace.
> > > 
> > > I was thinking more along the lines of KVM marking the VM as dead/bugged.  
> > 
> > In practice userspace will get an unhandled exit and kill the vcpu/guest,
> > but we could additionally flag the guest as dead.
> 
> Honest question, does it make sense from KVM to make the VM unusable?  E.g. is
> it feasible for userspace to keep running the VM?  Does the page that's in a bad
> state present any danger to the host?

If the reclaim fails (which it shouldn't), then KVM has a unique situation
where a non-gmem guest page is in a state. In theory, if the guest/userspace
could somehow induce a reclaim failure, then can they potentially trick the
host into trying to access that same page as a shared page and induce a
host RMP #PF.

So it does seem like a good idea to force the guest to stop executing. Then
once the guest is fully destroyed the bad page will stay leaked so it
won't affect subsequent activities.

> 
> > Is there a existing mechanism for this?
> 
> kvm_vm_dead()

Nice, that would do the trick. I'll modify the logic to also call that
after a reclaim failure.

Thanks,

Mike