linux-kernel - Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZkWD1ZrUbK90+3cM@yzhao56-desk.sh.intel.com>
Date: Thu, 16 May 2024 11:56:05 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "seanjc@...gle.com" <seanjc@...gle.com>, "isaku.yamahata@...il.com"
	<isaku.yamahata@...il.com>, "Huang, Kai" <kai.huang@...el.com>,
	"sagis@...gle.com" <sagis@...gle.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "Aktas, Erdem" <erdemaktas@...gle.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "dmatlack@...gle.com" <dmatlack@...gle.com>
Subject: Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only
 slot leafs on slot deletion

On Thu, May 16, 2024 at 07:56:18AM +0800, Edgecombe, Rick P wrote:
> On Wed, 2024-05-15 at 15:47 -0700, Sean Christopherson wrote:
> > > I didn't gather there was any proof of this. Did you have any hunch either
> > > way?
> > 
> > I doubt the guest was able to access memory it shouldn't have been able to
> > access.
> > But that's a moot point, as the bigger problem is that, because we have no
> > idea
> > what's at fault, KVM can't make any guarantees about the safety of such a
> > flag.
> > 
> > TDX is a special case where we don't have a better option (we do have other
> > options,
> > they're just horrible).  In other words, the choice is essentially to either:
> > 
> >  (a) cross our fingers and hope that the problem is limited to shared memory
> >      with QEMU+VFIO, i.e. and doesn't affect TDX private memory.
> > 
> > or 
> > 
> >  (b) don't merge TDX until the original regression is fully resolved.
> > 
> > FWIW, I would love to root cause and fix the failure, but I don't know how
> > feasible
> > that is at this point.
Me too. So curious about what's exactly broken.

> 
> If we think it is not a security issue, and we don't even know if it can be hit
> for TDX, then I'd be included to go with (a). Especially since we are just
> aiming for the most basic support, and don't have to worry about regressions in
> the classical sense.
> 
> I'm not sure how easy it will be to root cause it at this point. Hopefully Yan
> will be coming online soon. She mentioned some previous Intel effort to
> investigate it. Presumably we would have to start with the old kernel that
> exhibited the issue. If it can still be found...
I tried to reproduce it under the direction from Weijiang, though my NVIDIA card
was of a little difference as the one used by Weijiang.
However, I failed. I'm not sure whether it was because I did it remotely or
whether it was because I didn't spend enough time (since it's not an official
tasks assigned to me and I just did it out of curiosity).

If you think it's worthwhile, I would like to try again locally to see if I will
be lucky enough to reproduce and root-cause it.

But is it possible not to have TDX be pending on this bug/regression?