[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z2L0CPLKg9dh6imZ@google.com>
Date: Wed, 18 Dec 2024 08:10:48 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: pbonzini@...hat.com, kvm@...r.kernel.org, dave.hansen@...ux.intel.com,
rick.p.edgecombe@...el.com, kai.huang@...el.com, adrian.hunter@...el.com,
reinette.chatre@...el.com, xiaoyao.li@...el.com, tony.lindgren@...el.com,
binbin.wu@...ux.intel.com, dmatlack@...gle.com, isaku.yamahata@...el.com,
isaku.yamahata@...il.com, nik.borisov@...e.com, linux-kernel@...r.kernel.org,
x86@...nel.org
Subject: Re: [RFC PATCH 2/2] KVM: TDX: Kick off vCPUs when SEAMCALL is busy
during TD page removal
On Wed, Dec 18, 2024, Yan Zhao wrote:
> On Tue, Dec 17, 2024 at 03:29:03PM -0800, Sean Christopherson wrote:
> > On Thu, Nov 21, 2024, Yan Zhao wrote:
> > > For tdh_mem_range_block(), tdh_mem_track(), tdh_mem_page_remove(),
> > >
> > > - Upon detection of TDX_OPERAND_BUSY, retry each SEAMCALL only once.
> > > - During the retry, kick off all vCPUs and prevent any vCPU from entering
> > > to avoid potential contentions.
> > >
> > > Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
> > > ---
> > > arch/x86/include/asm/kvm_host.h | 2 ++
> > > arch/x86/kvm/vmx/tdx.c | 49 +++++++++++++++++++++++++--------
> > > 2 files changed, 40 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 521c7cf725bc..bb7592110337 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -123,6 +123,8 @@
> > > #define KVM_REQ_HV_TLB_FLUSH \
> > > KVM_ARCH_REQ_FLAGS(32, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> > > #define KVM_REQ_UPDATE_PROTECTED_GUEST_STATE KVM_ARCH_REQ(34)
> > > +#define KVM_REQ_NO_VCPU_ENTER_INPROGRESS \
> > > + KVM_ARCH_REQ_FLAGS(33, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> > >
> > > #define CR0_RESERVED_BITS \
> > > (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > index 60d9e9d050ad..ed6b41bbcec6 100644
> > > --- a/arch/x86/kvm/vmx/tdx.c
> > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > @@ -311,6 +311,20 @@ static void tdx_clear_page(unsigned long page_pa)
> > > __mb();
> > > }
> > >
> > > +static void tdx_no_vcpus_enter_start(struct kvm *kvm)
> > > +{
> > > + kvm_make_all_cpus_request(kvm, KVM_REQ_NO_VCPU_ENTER_INPROGRESS);
> >
> > I vote for making this a common request with a more succient name, e.g. KVM_REQ_PAUSE.
> KVM_REQ_PAUSE looks good to me. But will the "pause" cause any confusion with
> the guest's pause state?
Maybe?
> > And with appropriate helpers in common code. I could have sworn I floated this
> > idea in the past for something else, but apparently not. The only thing I can
> Yes, you suggested me to implement it via a request, similar to
> KVM_REQ_MCLOCK_INPROGRESS. [1].
> (I didn't add your suggested-by tag in this patch because it's just an RFC).
>
> [1] https://lore.kernel.org/kvm/ZuR09EqzU1WbQYGd@google.com/
>
> > find is an old arm64 version for pausing vCPUs to emulated. Hmm, maybe I was
> > thinking of KVM_REQ_OUTSIDE_GUEST_MODE?
> KVM_REQ_OUTSIDE_GUEST_MODE just kicks vCPUs outside guest mode, it does not set
> a bit in vcpu->requests to prevent later vCPUs entering.
Yeah, I was mostly just talking to myself. :-)
> > Anyways, I don't see any reason to make this an arch specific request.
> After making it non-arch specific, probably we need an atomic counter for the
> start/stop requests in the common helpers. So I just made it TDX-specific to
> keep it simple in the RFC.
Oh, right. I didn't consider the complications with multiple users. Hrm.
Actually, this doesn't need to be a request. KVM_REQ_OUTSIDE_GUEST_MODE will
forces vCPUs to exit, at which point tdx_vcpu_run() can return immediately with
EXIT_FASTPATH_EXIT_HANDLED, which is all that kvm_vcpu_exit_request() does. E.g.
have the zap side set wait_for_sept_zap before blasting the request to all vCPU,
and then in tdx_vcpu_run():
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b49dcf32206b..508ad6462e6d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -921,6 +921,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
return EXIT_FASTPATH_NONE;
}
+ if (unlikely(READ_ONCE(to_kvm_tdx(vcpu->kvm)->wait_for_sept_zap)))
+ return EXIT_FASTPATH_EXIT_HANDLED;
+
trace_kvm_entry(vcpu, force_immediate_exit);
if (pi_test_on(&tdx->pi_desc)) {
Ooh, but there's a subtle flaw with that approach. Unlike kvm_vcpu_exit_request(),
the above check would obviously happen before entry to the guest, which means that,
in theory, KVM needs to goto cancel_injection to re-request req_immediate_exit and
cancel injection:
if (req_immediate_exit)
kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_x86_call(cancel_injection)(vcpu);
But! This actually an opportunity to harden KVM. Because the TDX module doesn't
guarantee entry, it's already possible for KVM to _think_ it completely entry to
the guest without actually having done so. It just happens to work because KVM
never needs to force an immediate exit for TDX, and can't do direct injection,
i.e. can "safely" skip the cancel_injection path.
So, I think can and should go with the above suggestion, but also add a WARN on
req_immediate_exit being set, because TDX ignores it entirely.
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b49dcf32206b..e23cd8231144 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -914,6 +914,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
struct vcpu_tdx *tdx = to_tdx(vcpu);
struct vcpu_vt *vt = to_tdx(vcpu);
+ /* <comment goes here> */
+ WARN_ON_ONCE(force_immediate_exit);
+
/* TDX exit handle takes care of this error case. */
if (unlikely(tdx->state != VCPU_TD_STATE_INITIALIZED)) {
tdx->vp_enter_ret = TDX_SW_ERROR;
@@ -921,6 +924,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
return EXIT_FASTPATH_NONE;
}
+ if (unlikely(to_kvm_tdx(vcpu->kvm)->wait_for_sept_zap))
+ return EXIT_FASTPATH_EXIT_HANDLED;
+
trace_kvm_entry(vcpu, force_immediate_exit);
if (pi_test_on(&tdx->pi_desc)) {
Powered by blists - more mailing lists