[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZyuDgLycfadLDg3A@google.com>
Date: Wed, 6 Nov 2024 07:01:07 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Kai Huang <kai.huang@...el.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@...el.com>, Dave Hansen <dave.hansen@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, Xiaoyao Li <xiaoyao.li@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>, Dan J Williams <dan.j.williams@...el.com>,
Yan Y Zhao <yan.y.zhao@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
Adrian Hunter <adrian.hunter@...el.com>, Tony Lindgren <tony.lindgren@...el.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>, "kristen@...ux.intel.com" <kristen@...ux.intel.com>,
Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [PATCH 3/3] KVM: VMX: Initialize TDX during KVM module load
On Wed, Nov 06, 2024, Kai Huang wrote:
> On Thu, 2024-10-31 at 13:22 -0700, Sean Christopherson wrote:
> > On Thu, Oct 31, 2024, Kai Huang wrote:
> > > On Wed, 2024-10-30 at 08:19 -0700, Sean Christopherson wrote:
> > > > > +void __init tdx_bringup(void)
> > > > > +{
> > > > > + enable_tdx = enable_tdx && !__tdx_bringup();
> > > >
> > > > Ah. I don't love this approach because it mixes "failure" due to an unsupported
> > > > configuration, with failure due to unexpected issues. E.g. if enabling virtualization
> > > > fails, loading KVM-the-module absolutely should fail too, not simply disable TDX.
> > >
> > > Thanks for the comments.
> > >
> > > I see your point. However for "enabling virtualization failure" kvm_init() will
> > > also try to do (default behaviour), so if it fails it will result in module
> > > loading failure eventually. So while I guess it would be slightly better to
> > > make module loading fail if "enabling virtualization fails" in TDX, it is a nit
> > > issue to me.
> > >
> > > I think "enabling virtualization failure" is the only "unexpected issue" that
> > > should result in module loading failure. For any other TDX-specific
> > > initialization failure (e.g., any memory allocation in future patches) it's
> > > better to only disable TDX.
> >
> > I disagree. The platform owner wants TDX to be enabled, KVM shouldn't silently
> > disable TDX because of a transient, unrelated failure.
> >
> > If TDX _can't_ be supported, e.g. because EPT or MMIO SPTE caching was explicitly
> > disable, then that's different. And that's the general pattern throughout KVM.
> > If a requested feature isn't supported, then KVM continues on updates the module
> > param accordingly. But if something outright fails during setup, KVM aborts the
> > entire sequence.
> >
> > > So I can change to "make loading KVM-the-module fail if enabling virtualization
> > > fails in TDX", but I want to confirm this is what you want?
> >
> > I would prefer the logic to be: reject loading kvm-intel.ko if an operation that
> > would normally succeed, fails.
>
> I looked at the final tdx.c that in our development branch [*], and below is the
> list of the things that need to be done to init TDX (the code in
> __tdx_bringup()), and my thinking of whether to fail loading the module or just
> disable TDX:
>
> 1) Early dependency check fails. Those include: tdp_mmu_enabled,
> enable_mmio_caching, X86_FEATURE_MOVDIR64B check and check the presence of
> TSX_CTL uret MSR.
>
> For those we can disable TDX only but continue to load module.
>
> 2) Enable virtualization fails.
>
> For this we fail to load module (as you suggested).
>
> 3) Fail to register TDX cpuhp to do tdx_cpu_enable() and handle cpu hotplug.
>
> For this we only disable TDX but continue to load module. The reason is I think
> this is similar to enable a specific KVM feature but the hardware doesn't
> support it. We can go further to check the return value of tdx_cpu_enable() to
> distinguish cases like "module not loaded" and "unexpected error", but I really
> don't want to go that far.
Hrm, tdx_cpu_enable() is a bit of a mess. Ideally, there would be a separate
"probe" API so that KVM could detect if TDX is supported. Though maybe it's the
TDX module itself is flawed, e.g. if TDH_SYS_INIT is literally the only way to
detect whether or not a module is loaded.
So, absent a way to clean up tdx_cpu_enable(), maybe disable the module param if
it returns -ENODEV, otherwise fail the module load?
> 4) tdx_enable() fails.
>
> Ditto to 3).
No, this should fail the module load. E.g. most of the error conditions are
-ENOMEM, which has nothing to do with host support for TDX.
> 5) tdx_get_sysinfo() fails.
>
> This is a kernel bug since tdx_get_sysinfo() should always return valid TDX
> sysinfo structure pointer after tdx_enable() is done successfully. Currently we
> just WARN() if the returned pointer is NULL and disable TDX only. I think it's
> also fine.
>
> 6) TDX global metadata check fails, e.g., MAX_VCPUS etc.
>
> Ditto to 3). For this we disable TDX only.
Where is this code?
Powered by blists - more mailing lists