linux-kernel - Re: [PATCHv4 09/30] x86/tdx: Add MSR support for TDX guests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67dfdb0c-c483-ec3c-4fb8-57086ffde9bf@intel.com>
Date:   Thu, 24 Feb 2022 10:52:23 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        luto@...nel.org, peterz@...radead.org
Cc:     sathyanarayanan.kuppuswamy@...ux.intel.com, aarcange@...hat.com,
        ak@...ux.intel.com, dan.j.williams@...el.com, david@...hat.com,
        hpa@...or.com, jgross@...e.com, jmattson@...gle.com,
        joro@...tes.org, jpoimboe@...hat.com, knsathya@...nel.org,
        pbonzini@...hat.com, sdeep@...are.com, seanjc@...gle.com,
        tony.luck@...el.com, vkuznets@...hat.com, wanpengli@...cent.com,
        thomas.lendacky@....com, brijesh.singh@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCHv4 09/30] x86/tdx: Add MSR support for TDX guests

On 2/24/22 07:56, Kirill A. Shutemov wrote:
> Use hypercall to emulate MSR read/write for the TDX platform.
> 
> There are two viable approaches for doing MSRs in a TD guest:
> 
> 1. Execute the RDMSR/WRMSR instructions like most VMs and bare metal
>    do. Some will succeed, others will cause a #VE. All of those that
>    cause a #VE will be handled with a TDCALL.
> 2. Use paravirt infrastructure.  The paravirt hook has to keep a list
>    of which MSRs would cause a #VE and use a TDCALL.  All other MSRs
>    execute RDMSR/WRMSR instructions directly.
> 
> The second option can be ruled out because the list of MSRs was
> challenging to maintain. That leaves option #1 as the only viable
> solution for the minimal TDX support.
> 
> For performance-critical MSR writes (like TSC_DEADLINE), future patches
> will replace the WRMSR/#VE sequence with the direct TDCALL.

This will still leave us with a list of non-#VE-inducing MSRs.  That's
not great.  But, if we miss an MSR in the performance-critical list, the
result is a slow WRMSR->#VE.  If we miss an MSR in the paravirt
approach, we induce a fatal #VE.

Please add something to that effect if you revise this patch.

> RDMSR and WRMSR specification details can be found in
> Guest-Host-Communication Interface (GHCI) for Intel Trust Domain
> Extensions (Intel TDX) specification, sec titled "TDG.VP.
> VMCALL<Instruction.RDMSR>" and "TDG.VP.VMCALL<Instruction.WRMSR>".
> 
> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@...ux.intel.com>
> Reviewed-by: Andi Kleen <ak@...ux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@...el.com>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> ---
>  arch/x86/coco/tdx.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/arch/x86/coco/tdx.c b/arch/x86/coco/tdx.c
> index 0a2e6be0cdae..89992593a209 100644
> --- a/arch/x86/coco/tdx.c
> +++ b/arch/x86/coco/tdx.c
> @@ -116,6 +116,44 @@ void __cpuidle tdx_safe_halt(void)
>  		WARN_ONCE(1, "HLT instruction emulation failed\n");
>  }
>  
> +static bool read_msr(struct pt_regs *regs)
> +{
> +	struct tdx_hypercall_args args = {
> +		.r10 = TDX_HYPERCALL_STANDARD,
> +		.r11 = EXIT_REASON_MSR_READ,

Just a minor note: these "EXIT_REASON_FOO"'s in r11 are effectively
*the* hypercall being made, right?

The hypercall is being made in response to what would have otherwise
been a MSR read VMEXIT.  But, it's a *bit* goofy to see them here when
the TDX guest isn't doing any kind of VMEXIT.

I wish there were some clarity around it, but it's not a deal breaker.

> +		.r12 = regs->cx,
> +	};
> +
> +	/*
> +	 * Emulate the MSR read via hypercall. More info about ABI
> +	 * can be found in TDX Guest-Host-Communication Interface
> +	 * (GHCI), section titled "TDG.VP.VMCALL<Instruction.RDMSR>".
> +	 */
> +	if (__tdx_hypercall(&args, TDX_HCALL_HAS_OUTPUT))
> +		return false;
> +
> +	regs->ax = lower_32_bits(args.r11);
> +	regs->dx = upper_32_bits(args.r11);
> +	return true;
> +}
> +
> +static bool write_msr(struct pt_regs *regs)
> +{
> +	struct tdx_hypercall_args args = {
> +		.r10 = TDX_HYPERCALL_STANDARD,
> +		.r11 = EXIT_REASON_MSR_WRITE,
> +		.r12 = regs->cx,
> +		.r13 = (u64)regs->dx << 32 | regs->ax,
> +	};
> +
> +	/*
> +	 * Emulate the MSR write via hypercall. More info about ABI
> +	 * can be found in TDX Guest-Host-Communication Interface
> +	 * (GHCI) section titled "TDG.VP.VMCALL<Instruction.WRMSR>".
> +	 */
> +	return !__tdx_hypercall(&args, 0);
> +}
> +
>  void tdx_get_ve_info(struct ve_info *ve)
>  {
>  	struct tdx_module_output out;
> @@ -158,6 +196,10 @@ static bool virt_exception_kernel(struct pt_regs *regs, struct ve_info *ve)
>  	switch (ve->exit_reason) {
>  	case EXIT_REASON_HLT:
>  		return handle_halt();
> +	case EXIT_REASON_MSR_READ:
> +		return read_msr(regs);
> +	case EXIT_REASON_MSR_WRITE:
> +		return write_msr(regs);
>  	default:
>  		pr_warn("Unexpected #VE: %lld\n", ve->exit_reason);
>  		return false;

I still think it's annoying that all these WRMSR's are turned into #VE,
but this does seem like the best approach given the architecture that we
have.  Having the optimized ones seems like a good compromise.

Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>