linux-kernel - Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZkU7dl3BDXpwYwza@google.com>
Date: Wed, 15 May 2024 15:47:18 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Rick P Edgecombe <rick.p.edgecombe@...el.com>
Cc: "dmatlack@...gle.com" <dmatlack@...gle.com>, Kai Huang <kai.huang@...el.com>, 
	"sagis@...gle.com" <sagis@...gle.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Yan Y Zhao <yan.y.zhao@...el.com>, 
	Erdem Aktas <erdemaktas@...gle.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "isaku.yamahata@...il.com" <isaku.yamahata@...il.com>
Subject: Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only
 slot leafs on slot deletion

On Wed, May 15, 2024, Rick P Edgecombe wrote:
> On Wed, 2024-05-15 at 13:05 -0700, Sean Christopherson wrote:
> > On Wed, May 15, 2024, Rick P Edgecombe wrote:
> > > So rather then try to optimize zapping more someday and hit similar
> > > issues, let userspace decide how it wants it to be done. I'm not sure of
> > > the actual performance tradeoffs here, to be clear.
> > 
> > ...unless someone is able to root cause the VFIO regression, we don't have
> > the luxury of letting userspace give KVM a hint as to whether it might be
> > better to do a precise zap versus a nuke-and-pave.
> 
> Pedantry... I think it's not a regression if something requires a new flag. It
> is still a bug though.

Heh, pedantry denied.  I was speaking in the past tense about the VFIO failure,
which was a regression as I changed KVM behavior without adding a flag.

> The thing I worry about on the bug is whether it might have been due to a guest
> having access to page it shouldn't have. In which case we can't give the user
> the opportunity to create it.
> 
> I didn't gather there was any proof of this. Did you have any hunch either way?

I doubt the guest was able to access memory it shouldn't have been able to access.
But that's a moot point, as the bigger problem is that, because we have no idea
what's at fault, KVM can't make any guarantees about the safety of such a flag.

TDX is a special case where we don't have a better option (we do have other options,
they're just horrible).  In other words, the choice is essentially to either:

 (a) cross our fingers and hope that the problem is limited to shared memory
     with QEMU+VFIO, i.e. and doesn't affect TDX private memory.

or 

 (b) don't merge TDX until the original regression is fully resolved.

FWIW, I would love to root cause and fix the failure, but I don't know how feasible
that is at this point.

> > And more importantly, it would be a _hint_, not the hard requirement that TDX
> > needs.
> > 
> > > That said, a per-vm know is easier for TDX purposes.
> 
> If we don't want it to be a mandate from userspace, then we need to do some per-
> vm checking in TDX's case anyway. In which case we might as well go with the
> per-vm option for TDX.
> 
> You had said up the thread, why not opt all non-normal VMs into the new
> behavior. It will work great for TDX. But why do SEV and others want this
> automatically?

Because I want flexibility in KVM, i.e. I want to take the opportunity to try and
break away from KVM's godawful ABI.  It might be a pipe dream, as keying off the
VM type obviously has similar risks to giving userspace a memslot flag.  The one
sliver of hope is that the VM types really are quite new (though less so for SEV
and SEV-ES), whereas a memslot flag would be easily applied to existing VMs.