[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZkWD1ZrUbK90+3cM@yzhao56-desk.sh.intel.com>
Date: Thu, 16 May 2024 11:56:05 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "seanjc@...gle.com" <seanjc@...gle.com>, "isaku.yamahata@...il.com"
<isaku.yamahata@...il.com>, "Huang, Kai" <kai.huang@...el.com>,
"sagis@...gle.com" <sagis@...gle.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "Aktas, Erdem" <erdemaktas@...gle.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>, "kvm@...r.kernel.org"
<kvm@...r.kernel.org>, "dmatlack@...gle.com" <dmatlack@...gle.com>
Subject: Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only
slot leafs on slot deletion
On Thu, May 16, 2024 at 07:56:18AM +0800, Edgecombe, Rick P wrote:
> On Wed, 2024-05-15 at 15:47 -0700, Sean Christopherson wrote:
> > > I didn't gather there was any proof of this. Did you have any hunch either
> > > way?
> >
> > I doubt the guest was able to access memory it shouldn't have been able to
> > access.
> > But that's a moot point, as the bigger problem is that, because we have no
> > idea
> > what's at fault, KVM can't make any guarantees about the safety of such a
> > flag.
> >
> > TDX is a special case where we don't have a better option (we do have other
> > options,
> > they're just horrible). In other words, the choice is essentially to either:
> >
> > (a) cross our fingers and hope that the problem is limited to shared memory
> > with QEMU+VFIO, i.e. and doesn't affect TDX private memory.
> >
> > or
> >
> > (b) don't merge TDX until the original regression is fully resolved.
> >
> > FWIW, I would love to root cause and fix the failure, but I don't know how
> > feasible
> > that is at this point.
Me too. So curious about what's exactly broken.
>
> If we think it is not a security issue, and we don't even know if it can be hit
> for TDX, then I'd be included to go with (a). Especially since we are just
> aiming for the most basic support, and don't have to worry about regressions in
> the classical sense.
>
> I'm not sure how easy it will be to root cause it at this point. Hopefully Yan
> will be coming online soon. She mentioned some previous Intel effort to
> investigate it. Presumably we would have to start with the old kernel that
> exhibited the issue. If it can still be found...
I tried to reproduce it under the direction from Weijiang, though my NVIDIA card
was of a little difference as the one used by Weijiang.
However, I failed. I'm not sure whether it was because I did it remotely or
whether it was because I didn't spend enough time (since it's not an official
tasks assigned to me and I just did it out of curiosity).
If you think it's worthwhile, I would like to try again locally to see if I will
be lucky enough to reproduce and root-cause it.
But is it possible not to have TDX be pending on this bug/regression?
Powered by blists - more mailing lists