[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b10851221f8f9c2a60de5d302c0f6d15f802d77a.camel@intel.com>
Date: Tue, 9 Apr 2024 14:01:05 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pbonzini@...hat.com"
<pbonzini@...hat.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "michael.roth@....com"
<michael.roth@....com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>
Subject: Re: [PATCH v5 07/17] KVM: x86: add fields to struct kvm_arch for CoCo
features
On Mon, 2024-04-08 at 18:21 -0700, Sean Christopherson wrote:
>
> Whatever you do that is TDX specific and an internal KVM thing is likely the
> wrong
> thing :-)
>
> The main reason KVM doesn't do a targeted zap on memslot removal is because of
> ABI
> baggage that we _think_ is limited to interaction with VFIO. Since KVM
> doesn't
> have any ABI for TDX *or* SNP, I want to at least entertain the option of
> doing
> a target zap for SNP as well as TDX, even though it's only truly "necessary"
> for
> TDX, in quotes because it's not strictly necessary, e.g. KVM could BLOCK the
> S-EPT
> entries without fully removing the mappings.
>
> Whether or not targeted zapping is optimal for SNP (or any VM type) is very
> much
> TBD, and likely highly dependent on use case, but at the same time it would be
> nice to not rule it out completely.
>
> E.g. ChromeOS currently has a use case where they frequently delete and
> recreate
> a 2GiB (give or take) memslot. For that use case, zapping _just_ that memslot
> is
> likely far superious than blasting and rebuilding the entire VM. But if
> userspace
> deletes a 1TiB for some reason, e.g. for memory unplug?, then the fast zap is
> probably better, even though it requires rebuilding all SPTEs.
Interesting, thanks for the history.
>
> > > There seems to be an attempt to abstract away the existence of Secure-
> > > EPT in mmu.c, that is not fully successful. In this case the code
> > > checks kvm_gfn_shared_mask() to see if it needs to handle the zapping
> > > in a way specific needed by S-EPT. It ends up being a little confusing
> > > because the actual check is about whether there is a shared bit. It
> > > only works because only S-EPT is the only thing that has a
> > > kvm_gfn_shared_mask().
> > >
> > > Doing something like (kvm->arch.vm_type == KVM_X86_TDX_VM) looks wrong,
> > > but is more honest about what we are getting up to here. I'm not sure
> > > though, what do you think?
> >
> > Right, I attempted and failed in zapping case. This is due to the
> > restriction
> > that the Secure-EPT pages must be removed from the leaves. the VMX case
> > (also
> > NPT, even SNP) heavily depends on zapping root entry as optimization.
>
> As above, it's more nuanced than that. KVM has come to depend on the fast
> zap,
> but it got that way *because* KVM has historical zapped everything, and
> userspace
> has (unknowingly) relied on that behavior.
>
> > I can think of
> > - add TDX check. Looks wrong
> > - Use kvm_gfn_shared_mask(kvm). confusing
>
> Ya, even if we end up making it a hardcoded TDX thing, dress it up a bit.
> E.g.
> even if KVM checks for a shared mask under the hood, add a helper to capture
> the
> logic, e.g. kvm_zap_all_sptes_on_memslot_deletion(kvm).
>
> > - Give other name for this check like zap_from_leafs (or better name?)
> > The implementation is same to kvm_gfn_shared_mask() with comment.
> > - Or we can add a boolean variable to struct kvm
>
> If we _don't_ hardcode the behavior, a per-memslot flag or a per-VM capability
> (and thus boolean) is likely the way to go. My off-the-cuff vote is probably
> for
> a per-memslot flag.
The per-memslot flag is interesting. If we had a per-memslot flag it might be
nice for that 2GB memslot. For TDX, making userspace have to know about zapping
requirements is not ideal. If TDX somehow loses the restriction someday, then
userspace would have to manage that as well. I think the decision belongs inside
KVM, for TDX at least.
We'll have to take a look at how they would come together in the code.
Powered by blists - more mailing lists