[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z1uHbl0AeGsjLf5K@yzhao56-desk.sh.intel.com>
Date: Fri, 13 Dec 2024 09:01:34 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: <pbonzini@...hat.com>, <seanjc@...gle.com>, <kvm@...r.kernel.org>
CC: <dave.hansen@...ux.intel.com>, <rick.p.edgecombe@...el.com>,
<kai.huang@...el.com>, <adrian.hunter@...el.com>,
<reinette.chatre@...el.com>, <xiaoyao.li@...el.com>,
<tony.lindgren@...el.com>, <binbin.wu@...ux.intel.com>,
<dmatlack@...gle.com>, <isaku.yamahata@...el.com>,
<isaku.yamahata@...il.com>, <nik.borisov@...e.com>,
<linux-kernel@...r.kernel.org>, <x86@...nel.org>
Subject: Re: [RFC PATCH 0/2] SEPT SEAMCALL retry proposal
On Thu, Nov 21, 2024 at 07:51:39PM +0800, Yan Zhao wrote:
> This SEPT SEAMCALL retry proposal aims to remove patch
> "[HACK] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT"
> [1] at the tail of v2 series "TDX MMU Part 2".
>
> ==brief history==
>
> In the v1 series 'TDX MMU Part 2', there were several discussions regarding
> the necessity of retrying SEPT-related SEAMCALLs up to 16 times within the
> SEAMCALL wrapper tdx_seamcall_sept().
>
> The lock status of each SEAMCALL relevant to KVM was analyzed in [2].
>
> The conclusion was that 16 retries was necessary because
> - tdh_vp_enter() contends with tdh_mem*() when 0-stepping is suspected.
>
> When the TDX module detects that EPT violations are caused by the same
> RIP as in the last tdh_vp_enter() for 6 consecutive times, tdh_vp_enter()
> will take SEPT tree lock and contend with tdh_mem*().
>
> - tdg_mem_page_accept() can contend with other tdh_mem*().
>
>
> Sean provided several good suggestions[3], including:
> - Implement retries within TDX code when the TDP MMU returns
> RET_PF_RETRY_FOZEN (for RET_PF_RETRY and frozen SPTE) to avoid triggering
> 0-step mitigation.
> - It's not necessary for tdg_mem_page_accept() to contend with tdh_mem*()
> inside TDX module.
> - Use a method similar to KVM_REQ_MCLOCK_INPROGRESS to kick off vCPUs and
> prevent tdh_vp_enter() during page uninstallation.
>
> Yan later found out that only retry RET_PF_RETRY_FOZEN within TDX code is
> insufficient to prevent 0-step mitigation [4].
>
> Rick and Yan then consulted TDX module team with findings that:
> - The threshold of zero-step mitigation is counted per vCPU.
> It's of value 6 because
>
> "There can be at most 2 mapping faults on instruction fetch
> (x86 macro-instructions length is at most 15 bytes) when the
> instruction crosses page boundary; then there can be at most 2
> mapping faults for each memory operand, when the operand crosses
> page boundary. For most of x86 macro-instructions, there are up to 2
> memory operands and each one of them is small, which brings us to
> maximum 2+2*2 = 6 legal mapping faults."
>
> - Besides tdg_mem_page_accept(), tdg_mem_page_attr_rd/wr() can also
> contend with SEAMCALLs tdh_mem*().
>
> So, we decided to make a proposal to tolerate 0-step mitigation.
>
> ==proposal details==
>
> The proposal discusses SEPT-related and TLB-flush-related SEAMCALLs
> together which are required for page installation and uninstallation.
>
> These SEAMCALLs can be divided into three groups:
> Group 1: tdh_mem_page_add().
> The SEAMCALL is invoked only during TD build time and therefore
> KVM has ensured that no contention will occur.
>
> Proposal: (as in patch 1)
> Just return error when TDX_OPERAND_BUSY is found.
>
> Group 2: tdh_mem_sept_add(), tdh_mem_page_aug().
> These two SEAMCALLs are invoked for page installation.
> They return TDX_OPERAND_BUSY when contending with tdh_vp_enter()
> (due to 0-step mitigation) or TDCALLs tdg_mem_page_accept(),
> tdg_mem_page_attr_rd/wr().
>
> Proposal: (as in patch 1)
> - Return -EBUSY in KVM for TDX_OPERAND_BUSY to cause RET_PF_RETRY
> to be returned in kvm_mmu_do_page_fault()/kvm_mmu_page_fault().
>
> - Inside TDX's EPT violation handler, retry on RET_PF_RETRY as
> long as there are no pending signals/interrupts.
Alternatively, we can have the tdx_handle_ept_violation() do not retry
internally TDX.
Instead, keep the 16 times retries in tdx_seamcall_sept() for
tdh_mem_sept_add() and tdh_mem_page_aug() only, i.e. only for SEAMCALLs in
Group 2.
>
> The retry inside TDX aims to reduce the count of tdh_vp_enter()
> before resolving EPT violations in the local vCPU, thereby
> minimizing contentions with other vCPUs. However, it can't
> completely eliminate 0-step mitigation as it exits when there're
> pending signals/interrupts and does not (and cannot) remove the
> tdh_vp_enter() caused by KVM_EXIT_MEMORY_FAULT.
>
> Resources SHARED users EXCLUSIVE users
> ------------------------------------------------------------
> SEPT tree tdh_mem_sept_add tdh_vp_enter(0-step mitigation)
> tdh_mem_page_aug
> ------------------------------------------------------------
> SEPT entry tdh_mem_sept_add (Host lock)
> tdh_mem_page_aug (Host lock)
> tdg_mem_page_accept (Guest lock)
> tdg_mem_page_attr_rd (Guest lock)
> tdg_mem_page_attr_wr (Guest lock)
>
> Group 3: tdh_mem_range_block(), tdh_mem_track(), tdh_mem_page_remove().
> These three SEAMCALLs are invoked for page uninstallation, with
> KVM mmu_lock held for writing.
>
> Resources SHARED users EXCLUSIVE users
> ------------------------------------------------------------
> TDCS epoch tdh_vp_enter tdh_mem_track
> ------------------------------------------------------------
> SEPT tree tdh_mem_page_remove tdh_vp_enter (0-step mitigation)
> tdh_mem_range_block
> ------------------------------------------------------------
> SEPT entry tdh_mem_range_block (Host lock)
> tdh_mem_page_remove (Host lock)
> tdg_mem_page_accept (Guest lock)
> tdg_mem_page_attr_rd (Guest lock)
> tdg_mem_page_attr_wr (Guest lock)
>
> Proposal: (as in patch 2)
> - Upon detection of TDX_OPERAND_BUSY, retry each SEAMCALL only
> once.
> - During the retry, kick off all vCPUs and prevent any vCPU from
> entering to avoid potential contentions.
>
> This is because tdh_vp_enter() and TDCALLs are not protected by
> KVM mmu_lock, and remove_external_spte() in KVM must not fail.
>
>
>
> SEAMCALL Lock Type Resource
> -----------------------------Group 1--------------------------------
> tdh_mem_page_add EXCLUSIVE TDR
> NO_LOCK TDCS
> NO_LOCK SEPT tree
> EXCLUSIVE page to add
>
> ----------------------------Group 2--------------------------------
> tdh_mem_sept_add SHARED TDR
> SHARED TDCS
> SHARED SEPT tree
> HOST,EXCLUSIVE SEPT entry to modify
> EXCLUSIVE page to add
>
>
> tdh_mem_page_aug SHARED TDR
> SHARED TDCS
> SHARED SEPT tree
> HOST,EXCLUSIVE SEPT entry to modify
> EXCLUSIVE page to aug
>
> ----------------------------Group 3--------------------------------
> tdh_mem_range_block SHARED TDR
> SHARED TDCS
> EXCLUSIVE SEPT tree
> HOST,EXCLUSIVE SEPT entry to modify
>
> tdh_mem_track SHARED TDR
> SHARED TDCS
> EXCLUSIVE TDCS epoch
>
> tdh_mem_page_remove SHARED TDR
> SHARED TDCS
> SHARED SEPT tree
> HOST,EXCLUSIVE SEPT entry to modify
> EXCLUSIVE page to remove
>
>
> [1] https://lore.kernel.org/all/20241112073909.22326-1-yan.y.zhao@intel.com
> [2] https://lore.kernel.org/kvm/ZuP5eNXFCljzRgWo@yzhao56-desk.sh.intel.com
> [3] https://lore.kernel.org/kvm/ZuR09EqzU1WbQYGd@google.com
> [4] https://lore.kernel.org/kvm/Zw%2FKElXSOf1xqLE7@yzhao56-desk.sh.intel.com
>
> Yan Zhao (2):
> KVM: TDX: Retry in TDX when installing TD private/sept pages
> KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal
>
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/mmu/mmu.c | 2 +-
> arch/x86/kvm/vmx/tdx.c | 102 +++++++++++++++++++++++++-------
> 3 files changed, 85 insertions(+), 21 deletions(-)
>
> --
> 2.43.2
>
Powered by blists - more mailing lists