linux-kernel - Re: [PATCHv4 08/30] x86/tdx: Add HLT support for TDX guests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b4b3581-925b-32a8-8a4f-fdd8d98f2164@intel.com>
Date:   Thu, 24 Feb 2022 10:42:56 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
        luto@...nel.org, peterz@...radead.org
Cc:     sathyanarayanan.kuppuswamy@...ux.intel.com, aarcange@...hat.com,
        ak@...ux.intel.com, dan.j.williams@...el.com, david@...hat.com,
        hpa@...or.com, jgross@...e.com, jmattson@...gle.com,
        joro@...tes.org, jpoimboe@...hat.com, knsathya@...nel.org,
        pbonzini@...hat.com, sdeep@...are.com, seanjc@...gle.com,
        tony.luck@...el.com, vkuznets@...hat.com, wanpengli@...cent.com,
        thomas.lendacky@....com, brijesh.singh@....com, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCHv4 08/30] x86/tdx: Add HLT support for TDX guests

On 2/24/22 07:56, Kirill A. Shutemov wrote:
> The HLT instruction is a privileged instruction, executing it stops
> instruction execution and places the processor in a HALT state. It
> is used in kernel for cases like reboot, idle loop and exception fixup
> handlers. For the idle case, interrupts will be enabled (using STI)
> before the HLT instruction (this is also called safe_halt()).
> 
> To support the HLT instruction in TDX guests, it needs to be emulated
> using TDVMCALL (hypercall to VMM). More details about it can be found
> in Intel Trust Domain Extensions (Intel TDX) Guest-Host-Communication
> Interface (GHCI) specification, section TDVMCALL[Instruction.HLT].
> 
> In TDX guests, executing HLT instruction will generate a #VE, which is
> used to emulate the HLT instruction. But #VE based emulation will not
> work for the safe_halt() flavor, because it requires STI instruction to
> be executed just before the TDCALL. Since idle loop is the only user of
> safe_halt() variant, handle it as a special case.
> 
> To avoid *safe_halt() call in the idle function, define the
> tdx_guest_idle() and use it to override the "x86_idle" function pointer
> for a valid TDX guest.
> 
> Alternative choices like PV ops have been considered for adding
> safe_halt() support. But it was rejected because HLT paravirt calls
> only exist under PARAVIRT_XXL, and enabling it in TDX guest just for
> safe_halt() use case is not worth the cost.

Thanks for all the history and background here.

> diff --git a/arch/x86/coco/tdcall.S b/arch/x86/coco/tdcall.S
> index c4dd9468e7d9..3c35a056974d 100644
> --- a/arch/x86/coco/tdcall.S
> +++ b/arch/x86/coco/tdcall.S
> @@ -138,6 +138,19 @@ SYM_FUNC_START(__tdx_hypercall)
>  
>  	movl $TDVMCALL_EXPOSE_REGS_MASK, %ecx
>  
> +	/*
> +	 * For the idle loop STI needs to be called directly before the TDCALL
> +	 * that enters idle (EXIT_REASON_HLT case). STI instruction enables
> +	 * interrupts only one instruction later. If there is a window between
> +	 * STI and the instruction that emulates the HALT state, there is a
> +	 * chance for interrupts to happen in this window, which can delay the
> +	 * HLT operation indefinitely. Since this is the not the desired
> +	 * result, conditionally call STI before TDCALL.
> +	 */
> +	testq $TDX_HCALL_ISSUE_STI, %rsi
> +	jz .Lskip_sti
> +	sti
> +.Lskip_sti:
>  	tdcall
>  
>  	/*
> diff --git a/arch/x86/coco/tdx.c b/arch/x86/coco/tdx.c
> index 86a2f35e7308..0a2e6be0cdae 100644
> --- a/arch/x86/coco/tdx.c
> +++ b/arch/x86/coco/tdx.c
> @@ -7,6 +7,7 @@
>  #include <linux/cpufeature.h>
>  #include <asm/coco.h>
>  #include <asm/tdx.h>
> +#include <asm/vmx.h>
>  
>  /* TDX module Call Leaf IDs */
>  #define TDX_GET_INFO			1
> @@ -59,6 +60,62 @@ static void get_info(void)
>  	td_info.attributes = out.rdx;
>  }
>  
> +static u64 __cpuidle __halt(const bool irq_disabled, const bool do_sti)
> +{
> +	struct tdx_hypercall_args args = {
> +		.r10 = TDX_HYPERCALL_STANDARD,
> +		.r11 = EXIT_REASON_HLT,
> +		.r12 = irq_disabled,
> +	};
> +
> +	/*
> +	 * Emulate HLT operation via hypercall. More info about ABI
> +	 * can be found in TDX Guest-Host-Communication Interface
> +	 * (GHCI), section 3.8 TDG.VP.VMCALL<Instruction.HLT>.
> +	 *
> +	 * The VMM uses the "IRQ disabled" param to understand IRQ
> +	 * enabled status (RFLAGS.IF) of the TD guest and to determine
> +	 * whether or not it should schedule the halted vCPU if an
> +	 * IRQ becomes pending. E.g. if IRQs are disabled, the VMM
> +	 * can keep the vCPU in virtual HLT, even if an IRQ is
> +	 * pending, without hanging/breaking the guest.
> +	 */
> +	return __tdx_hypercall(&args, do_sti ? TDX_HCALL_ISSUE_STI : 0);
> +}
> +
> +static bool handle_halt(void)
> +{
> +	/*
> +	 * Since non safe halt is mainly used in CPU offlining
> +	 * and the guest will always stay in the halt state, don't
> +	 * call the STI instruction (set do_sti as false).
> +	 */
> +	const bool irq_disabled = irqs_disabled();
> +	const bool do_sti = false;
> +
> +	if (__halt(irq_disabled, do_sti))
> +		return false;
> +
> +	return true;
> +}

One other note: I really do like the silly:

	const bool do_sti = false;

variables as opposed to doing gunk like:

	__halt(irq_disabled, false));

Thanks for doing that.

Acked-by: Dave Hansen <dave.hansen@...ux.intel.com>