linux-kernel - Re: [PATCH v7 06/20] x86/virt/tdx: Shut down TDX module in case of error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bea49f7f-8411-662f-9fa8-97c98f61ce8b@intel.com>
Date:   Tue, 22 Nov 2022 10:57:52 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     Kai Huang <kai.huang@...el.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Cc:     linux-mm@...ck.org, seanjc@...gle.com, pbonzini@...hat.com,
        dan.j.williams@...el.com, rafael.j.wysocki@...el.com,
        kirill.shutemov@...ux.intel.com, ying.huang@...el.com,
        reinette.chatre@...el.com, len.brown@...el.com,
        tony.luck@...el.com, peterz@...radead.org, ak@...ux.intel.com,
        isaku.yamahata@...el.com, chao.gao@...el.com,
        sathyanarayanan.kuppuswamy@...ux.intel.com, bagasdotme@...il.com,
        sagis@...gle.com, imammedo@...hat.com
Subject: Re: [PATCH v7 06/20] x86/virt/tdx: Shut down TDX module in case of
 error

On 11/20/22 16:26, Kai Huang wrote:
> TDX supports shutting down the TDX module at any time during its
> lifetime.  After the module is shut down, no further TDX module SEAMCALL
> leaf functions can be made to the module on any logical cpu.
> 
> Shut down the TDX module in case of any error during the initialization
> process.  It's pointless to leave the TDX module in some middle state.
> 
> Shutting down the TDX module requires calling TDH.SYS.LP.SHUTDOWN on all
> BIOS-enabled CPUs, and the SEMACALL can run concurrently on different
> CPUs.  Implement a mechanism to run SEAMCALL concurrently on all online
> CPUs and use it to shut down the module.  Later logical-cpu scope module
> initialization will use it too.

To me, this starts to veer way too far into internal implementation details.

	Issue the TDH.SYS.LP.SHUTDOWN SEAMCALL on all BIOS-enabled CPUs
	to shut down the TDX module.

This is also the point where you should talk about the new
infrastructure.  Why do you need a new 'struct seamcall_something'?
What makes it special?

> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index b06c1a2bc9cb..5db1a05cb4bd 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -13,6 +13,8 @@
>  #include <linux/mutex.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> +#include <linux/smp.h>
> +#include <linux/atomic.h>
>  #include <asm/msr-index.h>
>  #include <asm/msr.h>
>  #include <asm/apic.h>
> @@ -124,15 +126,27 @@ bool platform_tdx_enabled(void)
>  	return !!tdx_keyid_num;
>  }
>  
> +/*
> + * Data structure to make SEAMCALL on multiple CPUs concurrently.
> + * @err is set to -EFAULT when SEAMCALL fails on any cpu.
> + */
> +struct seamcall_ctx {
> +	u64 fn;
> +	u64 rcx;
> +	u64 rdx;
> +	u64 r8;
> +	u64 r9;
> +	atomic_t err;
> +};
> +
>  /*
>   * Wrapper of __seamcall() to convert SEAMCALL leaf function error code
>   * to kernel error code.  @seamcall_ret and @out contain the SEAMCALL
>   * leaf function return code and the additional output respectively if
>   * not NULL.
>   */
> -static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> -				    u64 *seamcall_ret,
> -				    struct tdx_module_output *out)
> +static int seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
> +		    u64 *seamcall_ret, struct tdx_module_output *out)
>  {
>  	u64 sret;
>  
> @@ -166,6 +180,25 @@ static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
>  	}
>  }
>  
> +static void seamcall_smp_call_function(void *data)
> +{
> +	struct seamcall_ctx *sc = data;
> +	int ret;
> +
> +	ret = seamcall(sc->fn, sc->rcx, sc->rdx, sc->r8, sc->r9, NULL, NULL);
> +	if (ret)
> +		atomic_set(&sc->err, -EFAULT);
> +}

The atomic_t is kinda silly.  I guess it's not *that* wasteful though.

I think it would have actually been a lot more clear if instead of
containing an errno it was a *count* of the number of encountered errors.

An "atomic_set()" where everyone is overwriting each other is a bit
counterintuitive.  It's OK here, of course, but it still looks goofy.

If this were:

	atomic_inc(&sc->nr_errors);

it would be a lot more clear that *anyone* can increment and that it
truly is shared.

> +/*
> + * Call the SEAMCALL on all online CPUs concurrently.  Caller to check
> + * @sc->err to determine whether any SEAMCALL failed on any cpu.
> + */
> +static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
> +{
> +	on_each_cpu(seamcall_smp_call_function, sc, true);
> +}
> +
>  /*
>   * Detect and initialize the TDX module.
>   *
> @@ -181,7 +214,9 @@ static int init_tdx_module(void)
>  
>  static void shutdown_tdx_module(void)
>  {
> -	/* TODO: Shut down the TDX module */
> +	struct seamcall_ctx sc = { .fn = TDH_SYS_LP_SHUTDOWN };
> +
> +	seamcall_on_each_cpu(&sc);
>  }


The seamcall_on_each_cpu() function is silly as-is.  Either collapse the
functions or note in the changelog why this is not as silly as it looks.

>  static int __tdx_enable(void)
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 92a8de957dc7..215cc1065d78 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -12,6 +12,11 @@
>  /* MSR to report KeyID partitioning between MKTME and TDX */
>  #define MSR_IA32_MKTME_KEYID_PARTITIONING	0x00000087
>  
> +/*
> + * TDX module SEAMCALL leaf functions
> + */
> +#define TDH_SYS_LP_SHUTDOWN	44
> +
>  /*
>   * Do not put any hardware-defined TDX structure representations below
>   * this comment!