linux-kernel - Re: [PATCH v3 04/21] x86/virt/tdx: Add skeleton for detecting and initializing TDX on demand

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ac482f2b-d2d1-0643-faa4-1b36340268c5@intel.com>
Date:   Tue, 26 Apr 2022 13:53:41 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Kai Huang <kai.huang@...el.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Cc:     seanjc@...gle.com, pbonzini@...hat.com, len.brown@...el.com,
        tony.luck@...el.com, rafael.j.wysocki@...el.com,
        reinette.chatre@...el.com, dan.j.williams@...el.com,
        peterz@...radead.org, ak@...ux.intel.com,
        kirill.shutemov@...ux.intel.com,
        sathyanarayanan.kuppuswamy@...ux.intel.com,
        isaku.yamahata@...el.com
Subject: Re: [PATCH v3 04/21] x86/virt/tdx: Add skeleton for detecting and
 initializing TDX on demand

On 4/5/22 21:49, Kai Huang wrote:
> The TDX module is essentially a CPU-attested software module running
> in the new Secure Arbitration Mode (SEAM) to protect VMs from malicious
> host and certain physical attacks.  The TDX module implements the
> functions to build, tear down and start execution of the protected VMs
> called Trusted Domains (TD).  Before the TDX module can be used to
> create and run TD guests, it must be loaded into the SEAM Range Register
> (SEAMRR) and properly initialized.

The module isn't loaded into a register, right?

It's loaded into a memory area pointed to *by* the register.

>  The TDX module is expected to be
> loaded by BIOS before booting to the kernel, and the kernel is expected
> to detect and initialize it, using the SEAMCALLs defined by TDX
> architecture.

Wait a sec...  So, what was all this gobleygook about TDX module loading
and SEAMRR's if the kernel just has the TDX module *handed* to it
already loaded?

It looks to me like you wrote all of this before the TDX module was
being loaded by the BIOS and neglected to go and update these changelogs.

> The TDX module can be initialized only once in its lifetime.  Instead
> of always initializing it at boot time, this implementation chooses an
> on-demand approach to initialize TDX until there is a real need (e.g
> when requested by KVM).  This avoids consuming the memory that must be
> allocated by kernel and given to the TDX module as metadata (~1/256th of
> the TDX-usable memory), and also saves the time of initializing the TDX
> module (and the metadata) when TDX is not used at all.  Initializing the
> TDX module at runtime on-demand also is more flexible to support TDX
> module runtime updating in the future (after updating the TDX module, it
> needs to be initialized again).
> 
> Introduce two placeholders tdx_detect() and tdx_init() to detect and
> initialize the TDX module on demand, with a state machine introduced to
> orchestrate the entire process (in case of multiple callers).
> 
> To start with, tdx_detect() checks SEAMRR and TDX private KeyIDs.  The
> TDX module is reported as not loaded if either SEAMRR is not enabled, or
> there are no enough TDX private KeyIDs to create any TD guest.  The TDX
> module itself requires one global TDX private KeyID to crypto protect
> its metadata.

This is stepping over the line into telling me what the code does
instead of why.

> And tdx_init() is currently empty.  The TDX module will be initialized
> in multi-steps defined by the TDX architecture:
> 
>   1) Global initialization;
>   2) Logical-CPU scope initialization;
>   3) Enumerate the TDX module capabilities and platform configuration;
>   4) Configure the TDX module about usable memory ranges and global
>      KeyID information;
>   5) Package-scope configuration for the global KeyID;
>   6) Initialize usable memory ranges based on 4).
> 
> The TDX module can also be shut down at any time during its lifetime.
> In case of any error during the initialization process, shut down the
> module.  It's pointless to leave the module in any intermediate state
> during the initialization.
> 
> SEAMCALL requires SEAMRR being enabled and CPU being already in VMX
> operation (VMXON has been done), otherwise it generates #UD.  So far
> only KVM handles VMXON/VMXOFF.  Choose to not handle VMXON/VMXOFF in
> tdx_detect() and tdx_init() but depend on the caller to guarantee that,
> since so far KVM is the only user of TDX.  In the long term, more kernel
> components are likely to use VMXON/VMXOFF to support TDX (i.e. TDX
> module runtime update), so a reference-based approach to do VMXON/VMXOFF
> is likely needed.
> 
> Signed-off-by: Kai Huang <kai.huang@...el.com>
> ---
>  arch/x86/include/asm/tdx.h  |   4 +
>  arch/x86/virt/vmx/tdx/tdx.c | 222 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 226 insertions(+)
> 
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 1f29813b1646..c8af2ba6bb8a 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -92,8 +92,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
>  
>  #ifdef CONFIG_INTEL_TDX_HOST
>  void tdx_detect_cpu(struct cpuinfo_x86 *c);
> +int tdx_detect(void);
> +int tdx_init(void);
>  #else
>  static inline void tdx_detect_cpu(struct cpuinfo_x86 *c) { }
> +static inline int tdx_detect(void) { return -ENODEV; }
> +static inline int tdx_init(void) { return -ENODEV; }
>  #endif /* CONFIG_INTEL_TDX_HOST */
>  
>  #endif /* !__ASSEMBLY__ */
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index ba2210001ea8..53093d4ad458 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -9,6 +9,8 @@
>  
>  #include <linux/types.h>
>  #include <linux/cpumask.h>
> +#include <linux/mutex.h>
> +#include <linux/cpu.h>
>  #include <asm/msr-index.h>
>  #include <asm/msr.h>
>  #include <asm/cpufeature.h>
> @@ -45,12 +47,33 @@
>  		((u32)(((_keyid_part) & 0xffffffffull) + 1))
>  #define TDX_KEYID_NUM(_keyid_part)	((u32)((_keyid_part) >> 32))
>  
> +/*
> + * TDX module status during initialization
> + */
> +enum tdx_module_status_t {
> +	/* TDX module status is unknown */
> +	TDX_MODULE_UNKNOWN,
> +	/* TDX module is not loaded */
> +	TDX_MODULE_NONE,
> +	/* TDX module is loaded, but not initialized */
> +	TDX_MODULE_LOADED,
> +	/* TDX module is fully initialized */
> +	TDX_MODULE_INITIALIZED,
> +	/* TDX module is shutdown due to error during initialization */
> +	TDX_MODULE_SHUTDOWN,
> +};
> +
>  /* BIOS must configure SEAMRR registers for all cores consistently */
>  static u64 seamrr_base, seamrr_mask;
>  
>  static u32 tdx_keyid_start;
>  static u32 tdx_keyid_num;
>  
> +static enum tdx_module_status_t tdx_module_status;
> +
> +/* Prevent concurrent attempts on TDX detection and initialization */
> +static DEFINE_MUTEX(tdx_module_lock);
> +
>  static bool __seamrr_enabled(void)
>  {
>  	return (seamrr_mask & SEAMRR_ENABLED_BITS) == SEAMRR_ENABLED_BITS;
> @@ -172,3 +195,202 @@ void tdx_detect_cpu(struct cpuinfo_x86 *c)
>  	detect_seam(c);
>  	detect_tdx_keyids(c);
>  }
> +
> +static bool seamrr_enabled(void)
> +{
> +	/*
> +	 * To detect any BIOS misconfiguration among cores, all logical
> +	 * cpus must have been brought up at least once.  This is true
> +	 * unless 'maxcpus' kernel command line is used to limit the
> +	 * number of cpus to be brought up during boot time.  However
> +	 * 'maxcpus' is basically an invalid operation mode due to the
> +	 * MCE broadcast problem, and it should not be used on a TDX
> +	 * capable machine.  Just do paranoid check here and do not
> +	 * report SEAMRR as enabled in this case.
> +	 */
> +	if (!cpumask_equal(&cpus_booted_once_mask,
> +					cpu_present_mask))
> +		return false;
> +
> +	return __seamrr_enabled();
> +}
> +
> +static bool tdx_keyid_sufficient(void)
> +{
> +	if (!cpumask_equal(&cpus_booted_once_mask,
> +					cpu_present_mask))
> +		return false;

I'd move this cpumask_equal() to a helper.

> +	/*
> +	 * TDX requires at least two KeyIDs: one global KeyID to
> +	 * protect the metadata of the TDX module and one or more
> +	 * KeyIDs to run TD guests.
> +	 */
> +	return tdx_keyid_num >= 2;
> +}
> +
> +static int __tdx_detect(void)
> +{
> +	/* The TDX module is not loaded if SEAMRR is disabled */
> +	if (!seamrr_enabled()) {
> +		pr_info("SEAMRR not enabled.\n");
> +		goto no_tdx_module;
> +	}

Why even bother with the SEAMRR stuff?  It sounded like you can "ping"
the module with SEAMCALL.  Why not just use that directly?

> +	/*
> +	 * Also do not report the TDX module as loaded if there's
> +	 * no enough TDX private KeyIDs to run any TD guests.
> +	 */
> +	if (!tdx_keyid_sufficient()) {
> +		pr_info("Number of TDX private KeyIDs too small: %u.\n",
> +				tdx_keyid_num);
> +		goto no_tdx_module;
> +	}
> +
> +	/* Return -ENODEV until the TDX module is detected */
> +no_tdx_module:
> +	tdx_module_status = TDX_MODULE_NONE;
> +	return -ENODEV;
> +}
> +
> +static int init_tdx_module(void)
> +{
> +	/*
> +	 * Return -EFAULT until all steps of TDX module
> +	 * initialization are done.
> +	 */
> +	return -EFAULT;
> +}
> +
> +static void shutdown_tdx_module(void)
> +{
> +	/* TODO: Shut down the TDX module */
> +	tdx_module_status = TDX_MODULE_SHUTDOWN;
> +}
> +
> +static int __tdx_init(void)
> +{
> +	int ret;
> +
> +	/*
> +	 * Logical-cpu scope initialization requires calling one SEAMCALL
> +	 * on all logical cpus enabled by BIOS.  Shutting down the TDX
> +	 * module also has such requirement.  Further more, configuring
> +	 * the key of the global KeyID requires calling one SEAMCALL for
> +	 * each package.  For simplicity, disable CPU hotplug in the whole
> +	 * initialization process.
> +	 *
> +	 * It's perhaps better to check whether all BIOS-enabled cpus are
> +	 * online before starting initializing, and return early if not.

But you did some of this cpumask checking above.  Right?

> +	 * But none of 'possible', 'present' and 'online' CPU masks
> +	 * represents BIOS-enabled cpus.  For example, 'possible' mask is
> +	 * impacted by 'nr_cpus' or 'possible_cpus' kernel command line.
> +	 * Just let the SEAMCALL to fail if not all BIOS-enabled cpus are
> +	 * online.
> +	 */
> +	cpus_read_lock();
> +
> +	ret = init_tdx_module();
> +
> +	/*
> +	 * Shut down the TDX module in case of any error during the
> +	 * initialization process.  It's meaningless to leave the TDX
> +	 * module in any middle state of the initialization process.
> +	 */
> +	if (ret)
> +		shutdown_tdx_module();
> +
> +	cpus_read_unlock();
> +
> +	return ret;
> +}
> +
> +/**
> + * tdx_detect - Detect whether the TDX module has been loaded
> + *
> + * Detect whether the TDX module has been loaded and ready for
> + * initialization.  Only call this function when all cpus are
> + * already in VMX operation.
> + *
> + * This function can be called in parallel by multiple callers.
> + *
> + * Return:
> + *
> + * * -0:	The TDX module has been loaded and ready for
> + *		initialization.

"-0", eh?

> + * * -ENODEV:	The TDX module is not loaded.
> + * * -EPERM:	CPU is not in VMX operation.
> + * * -EFAULT:	Other internal fatal errors.
> + */
> +int tdx_detect(void)
> +{
> +	int ret;
> +
> +	mutex_lock(&tdx_module_lock);
> +
> +	switch (tdx_module_status) {
> +	case TDX_MODULE_UNKNOWN:
> +		ret = __tdx_detect();
> +		break;
> +	case TDX_MODULE_NONE:
> +		ret = -ENODEV;
> +		break;
> +	case TDX_MODULE_LOADED:
> +	case TDX_MODULE_INITIALIZED:
> +		ret = 0;
> +		break;
> +	case TDX_MODULE_SHUTDOWN:
> +		ret = -EFAULT;
> +		break;
> +	default:
> +		WARN_ON(1);
> +		ret = -EFAULT;
> +	}
> +
> +	mutex_unlock(&tdx_module_lock);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdx_detect);
> +
> +/**
> + * tdx_init - Initialize the TDX module
> + *
> + * Initialize the TDX module to make it ready to run TD guests.  This
> + * function should be called after tdx_detect() returns successful.
> + * Only call this function when all cpus are online and are in VMX
> + * operation.  CPU hotplug is temporarily disabled internally.
> + *
> + * This function can be called in parallel by multiple callers.
> + *
> + * Return:
> + *
> + * * -0:	The TDX module has been successfully initialized.
> + * * -ENODEV:	The TDX module is not loaded.
> + * * -EPERM:	The CPU which does SEAMCALL is not in VMX operation.
> + * * -EFAULT:	Other internal fatal errors.
> + */
> +int tdx_init(void)
> +{
> +	int ret;
> +
> +	mutex_lock(&tdx_module_lock);
> +
> +	switch (tdx_module_status) {
> +	case TDX_MODULE_NONE:
> +		ret = -ENODEV;
> +		break;
> +	case TDX_MODULE_LOADED:
> +		ret = __tdx_init();
> +		break;
> +	case TDX_MODULE_INITIALIZED:
> +		ret = 0;
> +		break;
> +	default:
> +		ret = -EFAULT;
> +		break;
> +	}
> +	mutex_unlock(&tdx_module_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(tdx_init);

Why does this need both a tdx_detect() and a tdx_init()?  Shouldn't the
interface from outside just be "get TDX up and running, please?"