[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZkUIMKxhhYbrvS8I@google.com>
Date: Wed, 15 May 2024 12:09:13 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Kai Huang <kai.huang@...el.com>
Cc: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
Rick Edgecombe <rick.p.edgecombe@...el.com>, "sagis@...gle.com" <sagis@...gle.com>,
"dmatlack@...gle.com" <dmatlack@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>, Yan Y Zhao <yan.y.zhao@...el.com>,
Erdem Aktas <erdemaktas@...gle.com>
Subject: Re: [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only
slot leafs on slot deletion
On Wed, May 15, 2024, Kai Huang wrote:
> On Tue, 2024-05-14 at 17:59 -0700, Rick Edgecombe wrote:
> > So, Sean re-considered about introducing a per-VM flag or per-memslot flag
> > again for VMs like TDX. [4]
Not really. I tried to be as clear as I could that my preference was that _if_
we exposing _something_ to userspace, then that something should probably be a
memslot flag.
: There's no concrete motiviation, it's more that _if_ we're going to expose a knob
: to userspace, then I'd prefer to make it as precise as possible to minimize the
: changes of KVM ending up back in ABI hell again.
As I stressed in that thread, I hadn't thought about this deeply enough to have
an opinion one way or the other.
: You're _really_ reading too much into my suggestion. As above, my suggestion
: was very spur of the momemnt. I haven't put much thought into the tradeoffs and
: side effects.
> > This patch is an implementation of per-memslot flag.
> > Compared to per-VM flag approach,
> > Pros:
> > (1) By allowing userspace to control the zapping behavior in fine-grained
> > granularity, optimizations for specific use cases can be developed
> > without future kernel changes.
> > (2) Allows developing new zapping behaviors without risking regressions by
> > changing KVM behavior, as seen previously.
> >
> > Cons:
> > (1) Users need to ensure all necessary memslots are with flag
> > KVM_MEM_ZAP_LEAFS_ONLY set.e.g. QEMU needs to ensure all GUEST_MEMFD
> > memslot is with ZAP_LEAFS_ONLY flag for TDX VM.
> > (2) Opens up the possibility that userspace could configure memslots for
> > normal VM in such a way that the bug [1] is seen.
>
> I don't quite follow the logic why userspace should be involved.
>
> TDX cannot use "page table fast zap", and need to use a different way to
> zap, a.k.a, zap-leaf-only while holding MMU write lock, but this doesn't
> necessarily mean such thing should be exposed to userspace?
>
> It's weird that userspace needs to control how does KVM zap page table for
> memslot delete/move.
Yeah, this isn't quite what I had in mind. Granted, what I had in mind may not
be much any better, but I definitely don't want to let userspace dictate exactly
how KVM manages SPTEs.
My thinking for a memslot flag was more of a "deleting this memslot doesn't have
side effects", i.e. a way for userspace to give KVM the green light to deviate
from KVM's historical behavior of rebuilding the entire page tables. Under the
hood, KVM would be allowed to do whatever it wants, e.g. for the initial
implementation, KVM would zap only leafs. But critically, KVM wouldn't be
_required_ to zap only leafs.
> So to me looks it's overkill to expose this "zap-leaf-only" to userspace.
> We can just set this flag for a TDX guest when memslot is created in KVM.
100% agreed from a functionality perspective. My thoughts/concerns are more about
KVM's ABI.
Hmm, actually, we already have new uAPI/ABI in the form of VM types. What if
we squeeze a documentation update into 6.10 (which adds the SEV VM flavors) to
state that KVM's historical behavior of blasting all SPTEs is only _guaranteed_
for KVM_X86_DEFAULT_VM?
Anyone know if QEMU deletes shared-only, i.e. non-guest_memfd, memslots during
SEV-* boot? If so, and assuming any such memslots are smallish, we could even
start enforcing the new ABI by doing a precise zap for small (arbitrary limit TBD)
shared-only memslots for !KVM_X86_DEFAULT_VM VMs.
Powered by blists - more mailing lists