linux-kernel - Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4cd18b6e-5e64-4b7d-9dbc-fd4c293cb4db@intel.com>
Date: Fri, 6 Sep 2024 13:41:37 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Rick Edgecombe <rick.p.edgecombe@...el.com>, <seanjc@...gle.com>,
	<pbonzini@...hat.com>, <kvm@...r.kernel.org>
CC: <dmatlack@...gle.com>, <isaku.yamahata@...il.com>, <yan.y.zhao@...el.com>,
	<nik.borisov@...e.com>, <linux-kernel@...r.kernel.org>, Yuan Yao
	<yuan.yao@...el.com>
Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with
 operand SEPT



On 4/09/2024 3:07 pm, Rick Edgecombe wrote:
> From: Yuan Yao <yuan.yao@...el.com>
> 
> TDX module internally uses locks to protect internal resources.  It tries
> to acquire the locks.  If it fails to obtain the lock, it returns
> TDX_OPERAND_BUSY error without spin because its execution time limitation.
> 
> TDX SEAMCALL API reference describes what resources are used.  It's known
> which TDX SEAMCALL can cause contention with which resources.  VMM can
> avoid contention inside the TDX module by avoiding contentious TDX SEAMCALL
> with, for example, spinlock.  Because OS knows better its process
> scheduling and its scalability, a lock at OS/VMM layer would work better
> than simply retrying TDX SEAMCALLs.
> 
> TDH.MEM.* API except for TDH.MEM.TRACK operates on a secure EPT tree and
> the TDX module internally tries to acquire the lock of the secure EPT tree.
> They return TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT in case of failure to
> get the lock.  TDX KVM allows sept callbacks to return error so that TDP
> MMU layer can retry.
> 
> Retry TDX TDH.MEM.* API on the error because the error is a rare event
> caused by zero-step attack mitigation.

The last paragraph seems can be improved:

It seems to say the "TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT" can only be 
cauesd by zero-step attack detection/mitigation, which isn't true from 
the previous paragraph.

In fact, I think this patch can be dropped:

1) The TDH_MEM_xx()s can return BUSY due to nature of TDP MMU, but all 
the callers of TDH_MEM_xx()s are already explicitly retrying by looking 
at the patch "KVM: TDX: Implement hooks to propagate changes of TDP MMU 
mirror page table" -- they either return PF_RETRY to let the fault to 
happen again or explicitly loop until no BUSY is returned.  So I am not 
sure why we need to "loo SEAMCALL_RETRY_MAX (16) times" in the common code.

2) TDH_VP_ENTER explicitly retries immediately for such case:

         /* See the comment of tdx_seamcall_sept(). */
         if (unlikely(vp_enter_ret == TDX_ERROR_SEPT_BUSY))
                 return EXIT_FASTPATH_REENTER_GUEST;


3) That means the _ONLY_ reason to retry in the common code for 
TDH_MEM_xx()s is to mitigate zero-step attack by reducing the times of 
letting guest to fault on the same instruction.

I don't think we need to handle zero-step attack mitigation in the first 
TDX support submission.  So I think we can just remove this patch.

> 
> Signed-off-by: Yuan Yao <yuan.yao@...el.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> ---
> TDX MMU part 2 v1:
>   - Updates from seamcall overhaul (Kai)
> 
> v19:
>   - fix typo TDG.VP.ENTER => TDH.VP.ENTER,
>     TDX_OPRRAN_BUSY => TDX_OPERAND_BUSY
>   - drop the description on TDH.VP.ENTER as this patch doesn't touch
>     TDH.VP.ENTER
> ---
>   arch/x86/kvm/vmx/tdx_ops.h | 48 ++++++++++++++++++++++++++++++++------
>   1 file changed, 41 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/tdx_ops.h b/arch/x86/kvm/vmx/tdx_ops.h
> index 0363d8544f42..8ca3e252a6ed 100644
> --- a/arch/x86/kvm/vmx/tdx_ops.h
> +++ b/arch/x86/kvm/vmx/tdx_ops.h
> @@ -31,6 +31,40 @@
>   #define pr_tdx_error_3(__fn, __err, __rcx, __rdx, __r8)	\
>   	pr_tdx_error_N(__fn, __err, "rcx 0x%llx, rdx 0x%llx, r8 0x%llx\n", __rcx, __rdx, __r8)
>   
> +/*
> + * TDX module acquires its internal lock for resources.  It doesn't spin to get
> + * locks because of its restrictions of allowed execution time.  Instead, it
> + * returns TDX_OPERAND_BUSY with an operand id.
> + *
> + * Multiple VCPUs can operate on SEPT.  Also with zero-step attack mitigation,
> + * TDH.VP.ENTER may rarely acquire SEPT lock and release it when zero-step
> + * attack is suspected.  It results in TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT
> + * with TDH.MEM.* operation.  Note: TDH.MEM.TRACK is an exception.
> + *
> + * Because TDP MMU uses read lock for scalability, spin lock around SEAMCALL
> + * spoils TDP MMU effort.  Retry several times with the assumption that SEPT
> + * lock contention is rare.  But don't loop forever to avoid lockup.  Let TDP
> + * MMU retry.
> + */
> +#define TDX_ERROR_SEPT_BUSY    (TDX_OPERAND_BUSY | TDX_OPERAND_ID_SEPT)
> +
> +static inline u64 tdx_seamcall_sept(u64 op, struct tdx_module_args *in)
> +{
> +#define SEAMCALL_RETRY_MAX     16
> +	struct tdx_module_args args_in;
> +	int retry = SEAMCALL_RETRY_MAX;
> +	u64 ret;
> +
> +	do {
> +		args_in = *in;
> +		ret = seamcall_ret(op, in);
> +	} while (ret == TDX_ERROR_SEPT_BUSY && retry-- > 0);
> +
> +	*in = args_in;
> +
> +	return ret;
> +}
> +
>   static inline u64 tdh_mng_addcx(struct kvm_tdx *kvm_tdx, hpa_t addr)
>   {
>   	struct tdx_module_args in = {
> @@ -55,7 +89,7 @@ static inline u64 tdh_mem_page_add(struct kvm_tdx *kvm_tdx, gpa_t gpa,
>   	u64 ret;
>   
>   	clflush_cache_range(__va(hpa), PAGE_SIZE);
> -	ret = seamcall_ret(TDH_MEM_PAGE_ADD, &in);
> +	ret = tdx_seamcall_sept(TDH_MEM_PAGE_ADD, &in);
>   
>   	*rcx = in.rcx;
>   	*rdx = in.rdx;
> @@ -76,7 +110,7 @@ static inline u64 tdh_mem_sept_add(struct kvm_tdx *kvm_tdx, gpa_t gpa,
>   
>   	clflush_cache_range(__va(page), PAGE_SIZE);
>   
> -	ret = seamcall_ret(TDH_MEM_SEPT_ADD, &in);
> +	ret = tdx_seamcall_sept(TDH_MEM_SEPT_ADD, &in);
>   
>   	*rcx = in.rcx;
>   	*rdx = in.rdx;
> @@ -93,7 +127,7 @@ static inline u64 tdh_mem_sept_remove(struct kvm_tdx *kvm_tdx, gpa_t gpa,
>   	};
>   	u64 ret;
>   
> -	ret = seamcall_ret(TDH_MEM_SEPT_REMOVE, &in);
> +	ret = tdx_seamcall_sept(TDH_MEM_SEPT_REMOVE, &in);
>   
>   	*rcx = in.rcx;
>   	*rdx = in.rdx;
> @@ -123,7 +157,7 @@ static inline u64 tdh_mem_page_aug(struct kvm_tdx *kvm_tdx, gpa_t gpa, hpa_t hpa
>   	u64 ret;
>   
>   	clflush_cache_range(__va(hpa), PAGE_SIZE);
> -	ret = seamcall_ret(TDH_MEM_PAGE_AUG, &in);
> +	ret = tdx_seamcall_sept(TDH_MEM_PAGE_AUG, &in);
>   
>   	*rcx = in.rcx;
>   	*rdx = in.rdx;
> @@ -140,7 +174,7 @@ static inline u64 tdh_mem_range_block(struct kvm_tdx *kvm_tdx, gpa_t gpa,
>   	};
>   	u64 ret;
>   
> -	ret = seamcall_ret(TDH_MEM_RANGE_BLOCK, &in);
> +	ret = tdx_seamcall_sept(TDH_MEM_RANGE_BLOCK, &in);
>   
>   	*rcx = in.rcx;
>   	*rdx = in.rdx;
> @@ -335,7 +369,7 @@ static inline u64 tdh_mem_page_remove(struct kvm_tdx *kvm_tdx, gpa_t gpa,
>   	};
>   	u64 ret;
>   
> -	ret = seamcall_ret(TDH_MEM_PAGE_REMOVE, &in);
> +	ret = tdx_seamcall_sept(TDH_MEM_PAGE_REMOVE, &in);
>   
>   	*rcx = in.rcx;
>   	*rdx = in.rdx;
> @@ -361,7 +395,7 @@ static inline u64 tdh_mem_range_unblock(struct kvm_tdx *kvm_tdx, gpa_t gpa,
>   	};
>   	u64 ret;
>   
> -	ret = seamcall_ret(TDH_MEM_RANGE_UNBLOCK, &in);
> +	ret = tdx_seamcall_sept(TDH_MEM_RANGE_UNBLOCK, &in);
>   
>   	*rcx = in.rcx;
>   	*rdx = in.rdx;