[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd954918981d5c823a8c2b8d1b346d4eb13f334f.camel@intel.com>
Date: Wed, 20 Apr 2022 16:37:11 +1200
From: Kai Huang <kai.huang@...el.com>
To: Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: seanjc@...gle.com, pbonzini@...hat.com, dave.hansen@...el.com,
len.brown@...el.com, tony.luck@...el.com,
rafael.j.wysocki@...el.com, reinette.chatre@...el.com,
dan.j.williams@...el.com, peterz@...radead.org, ak@...ux.intel.com,
kirill.shutemov@...ux.intel.com, isaku.yamahata@...el.com
Subject: Re: [PATCH v3 04/21] x86/virt/tdx: Add skeleton for detecting and
initializing TDX on demand
On Tue, 2022-04-19 at 07:53 -0700, Sathyanarayanan Kuppuswamy wrote:
>
> On 4/5/22 9:49 PM, Kai Huang wrote:
> > The TDX module is essentially a CPU-attested software module running
> > in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious
> > host and certain physical attacks. The TDX module implements the
>
> /s/host/hosts
I don't quite get. Could you explain why there are multiple hosts?
>
> > functions to build, tear down and start execution of the protected VMs
> > called Trusted Domains (TD). Before the TDX module can be used to
>
> /s/Trusted/Trust
Thanks.
>
> > create and run TD guests, it must be loaded into the SEAM Range Register
> > (SEAMRR) and properly initialized. The TDX module is expected to be
> > loaded by BIOS before booting to the kernel, and the kernel is expected
> > to detect and initialize it, using the SEAMCALLs defined by TDX
> > architecture.
> >
> > The TDX module can be initialized only once in its lifetime. Instead
> > of always initializing it at boot time, this implementation chooses an
> > on-demand approach to initialize TDX until there is a real need (e.g
> > when requested by KVM). This avoids consuming the memory that must be
> > allocated by kernel and given to the TDX module as metadata (~1/256th of
>
> allocated by the kernel
Ok.
>
> > the TDX-usable memory), and also saves the time of initializing the TDX
> > module (and the metadata) when TDX is not used at all. Initializing the
> > TDX module at runtime on-demand also is more flexible to support TDX
> > module runtime updating in the future (after updating the TDX module, it
> > needs to be initialized again).
> >
> > Introduce two placeholders tdx_detect() and tdx_init() to detect and
> > initialize the TDX module on demand, with a state machine introduced to
> > orchestrate the entire process (in case of multiple callers).
> >
> > To start with, tdx_detect() checks SEAMRR and TDX private KeyIDs. The
> > TDX module is reported as not loaded if either SEAMRR is not enabled, or
> > there are no enough TDX private KeyIDs to create any TD guest. The TDX
> > module itself requires one global TDX private KeyID to crypto protect
> > its metadata.
> >
> > And tdx_init() is currently empty. The TDX module will be initialized
> > in multi-steps defined by the TDX architecture:
> >
> > 1) Global initialization;
> > 2) Logical-CPU scope initialization;
> > 3) Enumerate the TDX module capabilities and platform configuration;
> > 4) Configure the TDX module about usable memory ranges and global
> > KeyID information;
> > 5) Package-scope configuration for the global KeyID;
> > 6) Initialize usable memory ranges based on 4).
> >
> > The TDX module can also be shut down at any time during its lifetime.
> > In case of any error during the initialization process, shut down the
> > module. It's pointless to leave the module in any intermediate state
> > during the initialization.
> >
> > SEAMCALL requires SEAMRR being enabled and CPU being already in VMX
> > operation (VMXON has been done), otherwise it generates #UD. So far
> > only KVM handles VMXON/VMXOFF. Choose to not handle VMXON/VMXOFF in
> > tdx_detect() and tdx_init() but depend on the caller to guarantee that,
> > since so far KVM is the only user of TDX. In the long term, more kernel
> > components are likely to use VMXON/VMXOFF to support TDX (i.e. TDX
> > module runtime update), so a reference-based approach to do VMXON/VMXOFF
> > is likely needed.
> >
> > Signed-off-by: Kai Huang <kai.huang@...el.com>
> > ---
> > arch/x86/include/asm/tdx.h | 4 +
> > arch/x86/virt/vmx/tdx/tdx.c | 222 ++++++++++++++++++++++++++++++++++++
> > 2 files changed, 226 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> > index 1f29813b1646..c8af2ba6bb8a 100644
> > --- a/arch/x86/include/asm/tdx.h
> > +++ b/arch/x86/include/asm/tdx.h
> > @@ -92,8 +92,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
> >
> > #ifdef CONFIG_INTEL_TDX_HOST
> > void tdx_detect_cpu(struct cpuinfo_x86 *c);
> > +int tdx_detect(void);
> > +int tdx_init(void);
> > #else
> > static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { }
> > +static inline int tdx_detect(void) { return -ENODEV; }
> > +static inline int tdx_init(void) { return -ENODEV; }
> > #endif /* CONFIG_INTEL_TDX_HOST */
> >
> > #endif /* !__ASSEMBLY__ */
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index ba2210001ea8..53093d4ad458 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -9,6 +9,8 @@
> >
> > #include <linux/types.h>
> > #include <linux/cpumask.h>
> > +#include <linux/mutex.h>
> > +#include <linux/cpu.h>
> > #include <asm/msr-index.h>
> > #include <asm/msr.h>
> > #include <asm/cpufeature.h>
> > @@ -45,12 +47,33 @@
> > ((u32)(((_keyid_part) & 0xffffffffull) + 1))
> > #define TDX_KEYID_NUM(_keyid_part) ((u32)((_keyid_part) >> 32))
> >
> > +/*
> > + * TDX module status during initialization
> > + */
> > +enum tdx_module_status_t {
> > + /* TDX module status is unknown */
> > + TDX_MODULE_UNKNOWN,
> > + /* TDX module is not loaded */
> > + TDX_MODULE_NONE,
> > + /* TDX module is loaded, but not initialized */
> > + TDX_MODULE_LOADED,
> > + /* TDX module is fully initialized */
> > + TDX_MODULE_INITIALIZED,
> > + /* TDX module is shutdown due to error during initialization */
> > + TDX_MODULE_SHUTDOWN,
> > +};
> > +
>
> May be adding these states when you really need will make
> more sense. Currently this patch only uses SHUTDOWN and
> NONE states. Other state usage is not very clear.
They are all used in tdx_detect() and tdx_init(), no?
>
> > /* BIOS must configure SEAMRR registers for all cores consistently */
> > static u64 seamrr_base, seamrr_mask;
> >
> > static u32 tdx_keyid_start;
> > static u32 tdx_keyid_num;
> >
> > +static enum tdx_module_status_t tdx_module_status;
> > +
> > +/* Prevent concurrent attempts on TDX detection and initialization */
> > +static DEFINE_MUTEX(tdx_module_lock);
>
> Any possible concurrent usage models?
tdx_detect() and tdx_init() are called on demand by callers, so it's possible
multiple callers can call into them concurrently.
>
> > +
> > static bool __seamrr_enabled(void)
> > {
> > return (seamrr_mask & SEAMRR_ENABLED_BITS) == SEAMRR_ENABLED_BITS;
> > @@ -172,3 +195,202 @@ void tdx_detect_cpu(struct cpuinfo_x86 *c)
> > detect_seam(c);
> > detect_tdx_keyids(c);
> > }
> > +
> > +static bool seamrr_enabled(void)
> > +{
> > + /*
> > + * To detect any BIOS misconfiguration among cores, all logical
> > + * cpus must have been brought up at least once. This is true
> > + * unless 'maxcpus' kernel command line is used to limit the
> > + * number of cpus to be brought up during boot time. However
> > + * 'maxcpus' is basically an invalid operation mode due to the
> > + * MCE broadcast problem, and it should not be used on a TDX
> > + * capable machine. Just do paranoid check here and do not
>
> a paranoid check
Ok.
>
> > + * report SEAMRR as enabled in this case.
> > + */
> > + if (!cpumask_equal(&cpus_booted_once_mask,
> > + cpu_present_mask))
> > + return false;
> > +
> > + return __seamrr_enabled();
> > +}
> > +
> > +static bool tdx_keyid_sufficient(void)
> > +{
> > + if (!cpumask_equal(&cpus_booted_once_mask,
> > + cpu_present_mask))
> > + return false;
> > +
> > + /*
> > + * TDX requires at least two KeyIDs: one global KeyID to
> > + * protect the metadata of the TDX module and one or more
> > + * KeyIDs to run TD guests.
> > + */
> > + return tdx_keyid_num >= 2;
> > +}
> > +
> > +static int __tdx_detect(void)
> > +{
> > + /* The TDX module is not loaded if SEAMRR is disabled */
> > + if (!seamrr_enabled()) {
> > + pr_info("SEAMRR not enabled.\n");
> > + goto no_tdx_module;
> > + }
> > +
> > + /*
> > + * Also do not report the TDX module as loaded if there's
> > + * no enough TDX private KeyIDs to run any TD guests.
> > + */
>
> You are not returning TDX_MODULE_LOADED under any current
> scenarios. So think above comment is not accurate.
This comment is to explain the logic behind of below TDX KeyID check. I don't
see how is it related to your comments?
This patch is pretty much a placeholder to express the idea of how are
tdx_detect() and tdx_init() going to be implemented. In below after the
tdx_keyid_sufficient() check, I also have a comment to explain the module hasn't
been detected yet which means there will be code to detect the module here, and
at that time, logically this function will return TDX_MODULE_LOADED. I don't
see this is hard to understand?
>
> > + if (!tdx_keyid_sufficient()) {
> > + pr_info("Number of TDX private KeyIDs too small: %u.\n",
> > + tdx_keyid_num);
> > + goto no_tdx_module;
> > + }
> > +
> > + /* Return -ENODEV until the TDX module is detected */
> > +no_tdx_module:
> > + tdx_module_status = TDX_MODULE_NONE;
> > + return -ENODEV;
> > +}
> > +
> > +static int init_tdx_module(void)
> > +{
> > + /*
> > + * Return -EFAULT until all steps of TDX module
> > + * initialization are done.
> > + */
> > + return -EFAULT;
> > +}
> > +
> > +static void shutdown_tdx_module(void)
> > +{
> > + /* TODO: Shut down the TDX module */
> > + tdx_module_status = TDX_MODULE_SHUTDOWN;
> > +}
> > +
> > +static int __tdx_init(void)
> > +{
> > + int ret;
> > +
> > + /*
> > + * Logical-cpu scope initialization requires calling one SEAMCALL
> > + * on all logical cpus enabled by BIOS. Shutting down the TDX
> > + * module also has such requirement. Further more, configuring
>
> such a requirement
Thanks.
>
> > + * the key of the global KeyID requires calling one SEAMCALL for
> > + * each package. For simplicity, disable CPU hotplug in the whole
> > + * initialization process.
> > + *
> > + * It's perhaps better to check whether all BIOS-enabled cpus are
> > + * online before starting initializing, and return early if not.
> > + * But none of 'possible', 'present' and 'online' CPU masks
> > + * represents BIOS-enabled cpus. For example, 'possible' mask is
> > + * impacted by 'nr_cpus' or 'possible_cpus' kernel command line.
> > + * Just let the SEAMCALL to fail if not all BIOS-enabled cpus are
> > + * online.
> > + */
> > + cpus_read_lock();
> > +
> > + ret = init_tdx_module();
> > +
> > + /*
> > + * Shut down the TDX module in case of any error during the
> > + * initialization process. It's meaningless to leave the TDX
> > + * module in any middle state of the initialization process.
> > + */
> > + if (ret)
> > + shutdown_tdx_module();
> > +
> > + cpus_read_unlock();
> > +
> > + return ret;
> > +}
> > +
> > +/**
> > + * tdx_detect - Detect whether the TDX module has been loaded
> > + *
> > + * Detect whether the TDX module has been loaded and ready for
> > + * initialization. Only call this function when all cpus are
> > + * already in VMX operation.
> > + *
> > + * This function can be called in parallel by multiple callers.
> > + *
> > + * Return:
> > + *
> > + * * -0: The TDX module has been loaded and ready for
> > + * initialization.
> > + * * -ENODEV: The TDX module is not loaded.
> > + * * -EPERM: CPU is not in VMX operation.
> > + * * -EFAULT: Other internal fatal errors.
> > + */
> > +int tdx_detect(void)
>
> Will this function be used separately or always along with
> tdx_init()?
The caller should first use tdx_detect() and then use tdx_init(). If caller
only uses tdx_detect(), then TDX module won't be initialized (unless other
caller does this). If caller calls tdx_init() before tdx_detect(), it will get
error.
>
> > +{
> > + int ret;
> > +
> > + mutex_lock(&tdx_module_lock);
> > +
> > + switch (tdx_module_status) {
> > + case TDX_MODULE_UNKNOWN:
> > + ret = __tdx_detect();
> > + break;
> > + case TDX_MODULE_NONE:
> > + ret = -ENODEV;
> > + break;
> > + case TDX_MODULE_LOADED:
> > + case TDX_MODULE_INITIALIZED:
> > + ret = 0;
> > + break;
> > + case TDX_MODULE_SHUTDOWN:
> > + ret = -EFAULT;
> > + break;
> > + default:
> > + WARN_ON(1);
> > + ret = -EFAULT;
> > + }
> > +
> > + mutex_unlock(&tdx_module_lock);
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(tdx_detect);
> > +
> > +/**
> > + * tdx_init - Initialize the TDX module
>
> If it for tdx module initialization, why not call it
> tdx_module_init()? If not, update the description
> appropriately.
Besides do the actual module initialization, it also has a state machine.
But point taken, and I'll try to refine the description. Thanks.
>
> > + *
> > + * Initialize the TDX module to make it ready to run TD guests. This
> > + * function should be called after tdx_detect() returns successful.
> > + * Only call this function when all cpus are online and are in VMX
> > + * operation. CPU hotplug is temporarily disabled internally.
> > + *
> > + * This function can be called in parallel by multiple callers.
> > + *
> > + * Return:
> > + *
> > + * * -0: The TDX module has been successfully initialized.
> > + * * -ENODEV: The TDX module is not loaded.
> > + * * -EPERM: The CPU which does SEAMCALL is not in VMX operation.
> > + * * -EFAULT: Other internal fatal errors.
> > + */
>
> You return differnt error values just for debug prints or there are
> other uses for it?
Caller can distinguish them and act differently. Even w/o any purpose, I think
it's better to return different error codes to reflect different error reasons.
--
Thanks,
-Kai
Powered by blists - more mailing lists