lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z1uHbl0AeGsjLf5K@yzhao56-desk.sh.intel.com>
Date: Fri, 13 Dec 2024 09:01:34 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: <pbonzini@...hat.com>, <seanjc@...gle.com>, <kvm@...r.kernel.org>
CC: <dave.hansen@...ux.intel.com>, <rick.p.edgecombe@...el.com>,
	<kai.huang@...el.com>, <adrian.hunter@...el.com>,
	<reinette.chatre@...el.com>, <xiaoyao.li@...el.com>,
	<tony.lindgren@...el.com>, <binbin.wu@...ux.intel.com>,
	<dmatlack@...gle.com>, <isaku.yamahata@...el.com>,
	<isaku.yamahata@...il.com>, <nik.borisov@...e.com>,
	<linux-kernel@...r.kernel.org>, <x86@...nel.org>
Subject: Re: [RFC PATCH 0/2] SEPT SEAMCALL retry proposal

On Thu, Nov 21, 2024 at 07:51:39PM +0800, Yan Zhao wrote:
> This SEPT SEAMCALL retry proposal aims to remove patch
> "[HACK] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT"
> [1] at the tail of v2 series "TDX MMU Part 2".
> 
> ==brief history==
> 
> In the v1 series 'TDX MMU Part 2', there were several discussions regarding
> the necessity of retrying SEPT-related SEAMCALLs up to 16 times within the
> SEAMCALL wrapper tdx_seamcall_sept().
> 
> The lock status of each SEAMCALL relevant to KVM was analyzed in [2].
> 
> The conclusion was that 16 retries was necessary because
> - tdh_vp_enter() contends with tdh_mem*() when 0-stepping is suspected.
> 
>   When the TDX module detects that EPT violations are caused by the same
>   RIP as in the last tdh_vp_enter() for 6 consecutive times, tdh_vp_enter()
>   will take SEPT tree lock and contend with tdh_mem*().
> 
> - tdg_mem_page_accept() can contend with other tdh_mem*().
> 
> 
> Sean provided several good suggestions[3], including:
> - Implement retries within TDX code when the TDP MMU returns
>   RET_PF_RETRY_FOZEN (for RET_PF_RETRY and frozen SPTE) to avoid triggering
>   0-step mitigation.
> - It's not necessary for tdg_mem_page_accept() to contend with tdh_mem*()
>   inside TDX module.
> - Use a method similar to KVM_REQ_MCLOCK_INPROGRESS to kick off vCPUs and
>   prevent tdh_vp_enter() during page uninstallation.
> 
> Yan later found out that only retry RET_PF_RETRY_FOZEN within TDX code is
> insufficient to prevent 0-step mitigation [4].
> 
> Rick and Yan then consulted TDX module team with findings that:
> - The threshold of zero-step mitigation is counted per vCPU.
>   It's of value 6 because
> 
>     "There can be at most 2 mapping faults on instruction fetch
>      (x86 macro-instructions length is at most 15 bytes) when the
>      instruction crosses page boundary; then there can be at most 2
>      mapping faults for each memory operand, when the operand crosses
>      page boundary. For most of x86 macro-instructions, there are up to 2
>      memory operands and each one of them is small, which brings us to
>      maximum 2+2*2 = 6 legal mapping faults."
>   
> - Besides tdg_mem_page_accept(),  tdg_mem_page_attr_rd/wr() can also 
>   contend with SEAMCALLs tdh_mem*().
> 
> So, we decided to make a proposal to tolerate 0-step mitigation.
> 
> ==proposal details==
> 
> The proposal discusses SEPT-related and TLB-flush-related SEAMCALLs
> together which are required for page installation and uninstallation.
> 
> These SEAMCALLs can be divided into three groups:
> Group 1: tdh_mem_page_add().
>          The SEAMCALL is invoked only during TD build time and therefore
>          KVM has ensured that no contention will occur.
> 
>          Proposal: (as in patch 1)
>          Just return error when TDX_OPERAND_BUSY is found.
> 
> Group 2: tdh_mem_sept_add(), tdh_mem_page_aug().
>          These two SEAMCALLs are invoked for page installation. 
>          They return TDX_OPERAND_BUSY when contending with tdh_vp_enter()
> 	 (due to 0-step mitigation) or TDCALLs tdg_mem_page_accept(),
> 	 tdg_mem_page_attr_rd/wr().
> 
>          Proposal: (as in patch 1)
>          - Return -EBUSY in KVM for TDX_OPERAND_BUSY to cause RET_PF_RETRY
>            to be returned in kvm_mmu_do_page_fault()/kvm_mmu_page_fault().
>          
>          - Inside TDX's EPT violation handler, retry on RET_PF_RETRY as
>            long as there are no pending signals/interrupts.
Alternatively, we can have the tdx_handle_ept_violation() do not retry
internally TDX.
Instead, keep the 16 times retries in tdx_seamcall_sept() for                    
tdh_mem_sept_add() and tdh_mem_page_aug() only, i.e. only for SEAMCALLs in       
Group 2.

> 
>          The retry inside TDX aims to reduce the count of tdh_vp_enter()
>          before resolving EPT violations in the local vCPU, thereby
>          minimizing contentions with other vCPUs. However, it can't
>          completely eliminate 0-step mitigation as it exits when there're
>          pending signals/interrupts and does not (and cannot) remove the
>          tdh_vp_enter() caused by KVM_EXIT_MEMORY_FAULT.
> 
>          Resources    SHARED  users      EXCLUSIVE users
>          ------------------------------------------------------------
>          SEPT tree  tdh_mem_sept_add     tdh_vp_enter(0-step mitigation)
>                     tdh_mem_page_aug
>          ------------------------------------------------------------
>          SEPT entry                      tdh_mem_sept_add (Host lock)
>                                          tdh_mem_page_aug (Host lock)
>                                          tdg_mem_page_accept (Guest lock)
>                                          tdg_mem_page_attr_rd (Guest lock)
>                                          tdg_mem_page_attr_wr (Guest lock)
> 
> Group 3: tdh_mem_range_block(), tdh_mem_track(), tdh_mem_page_remove().
>          These three SEAMCALLs are invoked for page uninstallation, with
>          KVM mmu_lock held for writing.
> 
>          Resources     SHARED users      EXCLUSIVE users
>          ------------------------------------------------------------
>          TDCS epoch    tdh_vp_enter      tdh_mem_track
>          ------------------------------------------------------------
>          SEPT tree  tdh_mem_page_remove  tdh_vp_enter (0-step mitigation)
>                                          tdh_mem_range_block   
>          ------------------------------------------------------------
>          SEPT entry                      tdh_mem_range_block (Host lock)
>                                          tdh_mem_page_remove (Host lock)
>                                          tdg_mem_page_accept (Guest lock)
>                                          tdg_mem_page_attr_rd (Guest lock)
>                                          tdg_mem_page_attr_wr (Guest lock)
> 
>          Proposal: (as in patch 2)
>          - Upon detection of TDX_OPERAND_BUSY, retry each SEAMCALL only
>            once.
>          - During the retry, kick off all vCPUs and prevent any vCPU from
>            entering to avoid potential contentions.
> 
>          This is because tdh_vp_enter() and TDCALLs are not protected by
>          KVM mmu_lock, and remove_external_spte() in KVM must not fail.
> 
> 
> 
> SEAMCALL                Lock Type        Resource 
> -----------------------------Group 1--------------------------------
> tdh_mem_page_add        EXCLUSIVE        TDR
>                         NO_LOCK          TDCS
>                         NO_LOCK          SEPT tree
>                         EXCLUSIVE        page to add
> 
> ----------------------------Group 2--------------------------------
> tdh_mem_sept_add        SHARED           TDR
>                         SHARED           TDCS
>                         SHARED           SEPT tree
>                         HOST,EXCLUSIVE   SEPT entry to modify
>                         EXCLUSIVE        page to add
> 
> 
> tdh_mem_page_aug        SHARED           TDR
>                         SHARED           TDCS
>                         SHARED           SEPT tree
>                         HOST,EXCLUSIVE   SEPT entry to modify
>                         EXCLUSIVE        page to aug
> 
> ----------------------------Group 3--------------------------------
> tdh_mem_range_block     SHARED           TDR
>                         SHARED           TDCS
>                         EXCLUSIVE        SEPT tree
>                         HOST,EXCLUSIVE   SEPT entry to modify
> 
> tdh_mem_track           SHARED           TDR
>                         SHARED           TDCS
>                         EXCLUSIVE        TDCS epoch
> 
> tdh_mem_page_remove     SHARED           TDR
>                         SHARED           TDCS
>                         SHARED           SEPT tree
>                         HOST,EXCLUSIVE   SEPT entry to modify
>                         EXCLUSIVE        page to remove
> 
> 
> [1] https://lore.kernel.org/all/20241112073909.22326-1-yan.y.zhao@intel.com
> [2] https://lore.kernel.org/kvm/ZuP5eNXFCljzRgWo@yzhao56-desk.sh.intel.com
> [3] https://lore.kernel.org/kvm/ZuR09EqzU1WbQYGd@google.com
> [4] https://lore.kernel.org/kvm/Zw%2FKElXSOf1xqLE7@yzhao56-desk.sh.intel.com
> 
> Yan Zhao (2):
>   KVM: TDX: Retry in TDX when installing TD private/sept pages
>   KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal
> 
>  arch/x86/include/asm/kvm_host.h |   2 +
>  arch/x86/kvm/mmu/mmu.c          |   2 +-
>  arch/x86/kvm/vmx/tdx.c          | 102 +++++++++++++++++++++++++-------
>  3 files changed, 85 insertions(+), 21 deletions(-)
> 
> -- 
> 2.43.2
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ