[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241121115139.26338-1-yan.y.zhao@intel.com>
Date: Thu, 21 Nov 2024 19:51:39 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: pbonzini@...hat.com,
seanjc@...gle.com,
kvm@...r.kernel.org
Cc: dave.hansen@...ux.intel.com,
rick.p.edgecombe@...el.com,
kai.huang@...el.com,
adrian.hunter@...el.com,
reinette.chatre@...el.com,
xiaoyao.li@...el.com,
tony.lindgren@...el.com,
binbin.wu@...ux.intel.com,
dmatlack@...gle.com,
isaku.yamahata@...el.com,
isaku.yamahata@...il.com,
nik.borisov@...e.com,
linux-kernel@...r.kernel.org,
x86@...nel.org,
Yan Zhao <yan.y.zhao@...el.com>
Subject: [RFC PATCH 0/2] SEPT SEAMCALL retry proposal
This SEPT SEAMCALL retry proposal aims to remove patch
"[HACK] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with operand SEPT"
[1] at the tail of v2 series "TDX MMU Part 2".
==brief history==
In the v1 series 'TDX MMU Part 2', there were several discussions regarding
the necessity of retrying SEPT-related SEAMCALLs up to 16 times within the
SEAMCALL wrapper tdx_seamcall_sept().
The lock status of each SEAMCALL relevant to KVM was analyzed in [2].
The conclusion was that 16 retries was necessary because
- tdh_vp_enter() contends with tdh_mem*() when 0-stepping is suspected.
When the TDX module detects that EPT violations are caused by the same
RIP as in the last tdh_vp_enter() for 6 consecutive times, tdh_vp_enter()
will take SEPT tree lock and contend with tdh_mem*().
- tdg_mem_page_accept() can contend with other tdh_mem*().
Sean provided several good suggestions[3], including:
- Implement retries within TDX code when the TDP MMU returns
RET_PF_RETRY_FOZEN (for RET_PF_RETRY and frozen SPTE) to avoid triggering
0-step mitigation.
- It's not necessary for tdg_mem_page_accept() to contend with tdh_mem*()
inside TDX module.
- Use a method similar to KVM_REQ_MCLOCK_INPROGRESS to kick off vCPUs and
prevent tdh_vp_enter() during page uninstallation.
Yan later found out that only retry RET_PF_RETRY_FOZEN within TDX code is
insufficient to prevent 0-step mitigation [4].
Rick and Yan then consulted TDX module team with findings that:
- The threshold of zero-step mitigation is counted per vCPU.
It's of value 6 because
"There can be at most 2 mapping faults on instruction fetch
(x86 macro-instructions length is at most 15 bytes) when the
instruction crosses page boundary; then there can be at most 2
mapping faults for each memory operand, when the operand crosses
page boundary. For most of x86 macro-instructions, there are up to 2
memory operands and each one of them is small, which brings us to
maximum 2+2*2 = 6 legal mapping faults."
- Besides tdg_mem_page_accept(), tdg_mem_page_attr_rd/wr() can also
contend with SEAMCALLs tdh_mem*().
So, we decided to make a proposal to tolerate 0-step mitigation.
==proposal details==
The proposal discusses SEPT-related and TLB-flush-related SEAMCALLs
together which are required for page installation and uninstallation.
These SEAMCALLs can be divided into three groups:
Group 1: tdh_mem_page_add().
The SEAMCALL is invoked only during TD build time and therefore
KVM has ensured that no contention will occur.
Proposal: (as in patch 1)
Just return error when TDX_OPERAND_BUSY is found.
Group 2: tdh_mem_sept_add(), tdh_mem_page_aug().
These two SEAMCALLs are invoked for page installation.
They return TDX_OPERAND_BUSY when contending with tdh_vp_enter()
(due to 0-step mitigation) or TDCALLs tdg_mem_page_accept(),
tdg_mem_page_attr_rd/wr().
Proposal: (as in patch 1)
- Return -EBUSY in KVM for TDX_OPERAND_BUSY to cause RET_PF_RETRY
to be returned in kvm_mmu_do_page_fault()/kvm_mmu_page_fault().
- Inside TDX's EPT violation handler, retry on RET_PF_RETRY as
long as there are no pending signals/interrupts.
The retry inside TDX aims to reduce the count of tdh_vp_enter()
before resolving EPT violations in the local vCPU, thereby
minimizing contentions with other vCPUs. However, it can't
completely eliminate 0-step mitigation as it exits when there're
pending signals/interrupts and does not (and cannot) remove the
tdh_vp_enter() caused by KVM_EXIT_MEMORY_FAULT.
Resources SHARED users EXCLUSIVE users
------------------------------------------------------------
SEPT tree tdh_mem_sept_add tdh_vp_enter(0-step mitigation)
tdh_mem_page_aug
------------------------------------------------------------
SEPT entry tdh_mem_sept_add (Host lock)
tdh_mem_page_aug (Host lock)
tdg_mem_page_accept (Guest lock)
tdg_mem_page_attr_rd (Guest lock)
tdg_mem_page_attr_wr (Guest lock)
Group 3: tdh_mem_range_block(), tdh_mem_track(), tdh_mem_page_remove().
These three SEAMCALLs are invoked for page uninstallation, with
KVM mmu_lock held for writing.
Resources SHARED users EXCLUSIVE users
------------------------------------------------------------
TDCS epoch tdh_vp_enter tdh_mem_track
------------------------------------------------------------
SEPT tree tdh_mem_page_remove tdh_vp_enter (0-step mitigation)
tdh_mem_range_block
------------------------------------------------------------
SEPT entry tdh_mem_range_block (Host lock)
tdh_mem_page_remove (Host lock)
tdg_mem_page_accept (Guest lock)
tdg_mem_page_attr_rd (Guest lock)
tdg_mem_page_attr_wr (Guest lock)
Proposal: (as in patch 2)
- Upon detection of TDX_OPERAND_BUSY, retry each SEAMCALL only
once.
- During the retry, kick off all vCPUs and prevent any vCPU from
entering to avoid potential contentions.
This is because tdh_vp_enter() and TDCALLs are not protected by
KVM mmu_lock, and remove_external_spte() in KVM must not fail.
SEAMCALL Lock Type Resource
-----------------------------Group 1--------------------------------
tdh_mem_page_add EXCLUSIVE TDR
NO_LOCK TDCS
NO_LOCK SEPT tree
EXCLUSIVE page to add
----------------------------Group 2--------------------------------
tdh_mem_sept_add SHARED TDR
SHARED TDCS
SHARED SEPT tree
HOST,EXCLUSIVE SEPT entry to modify
EXCLUSIVE page to add
tdh_mem_page_aug SHARED TDR
SHARED TDCS
SHARED SEPT tree
HOST,EXCLUSIVE SEPT entry to modify
EXCLUSIVE page to aug
----------------------------Group 3--------------------------------
tdh_mem_range_block SHARED TDR
SHARED TDCS
EXCLUSIVE SEPT tree
HOST,EXCLUSIVE SEPT entry to modify
tdh_mem_track SHARED TDR
SHARED TDCS
EXCLUSIVE TDCS epoch
tdh_mem_page_remove SHARED TDR
SHARED TDCS
SHARED SEPT tree
HOST,EXCLUSIVE SEPT entry to modify
EXCLUSIVE page to remove
[1] https://lore.kernel.org/all/20241112073909.22326-1-yan.y.zhao@intel.com
[2] https://lore.kernel.org/kvm/ZuP5eNXFCljzRgWo@yzhao56-desk.sh.intel.com
[3] https://lore.kernel.org/kvm/ZuR09EqzU1WbQYGd@google.com
[4] https://lore.kernel.org/kvm/Zw%2FKElXSOf1xqLE7@yzhao56-desk.sh.intel.com
Yan Zhao (2):
KVM: TDX: Retry in TDX when installing TD private/sept pages
KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/vmx/tdx.c | 102 +++++++++++++++++++++++++-------
3 files changed, 85 insertions(+), 21 deletions(-)
--
2.43.2
Powered by blists - more mailing lists