lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4a3b3b6bc96e111e5380de4681a20c2734d82a33.camel@intel.com>
Date: Tue, 26 Nov 2024 00:46:51 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "seanjc@...gle.com" <seanjc@...gle.com>, "Zhao, Yan Y"
	<yan.y.zhao@...el.com>
CC: "isaku.yamahata@...il.com" <isaku.yamahata@...il.com>, "x86@...nel.org"
	<x86@...nel.org>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>, "Chatre,
 Reinette" <reinette.chatre@...el.com>, "Hunter, Adrian"
	<adrian.hunter@...el.com>, "Lindgren, Tony" <tony.lindgren@...el.com>,
	"dmatlack@...gle.com" <dmatlack@...gle.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"Huang, Kai" <kai.huang@...el.com>, "nik.borisov@...e.com"
	<nik.borisov@...e.com>
Subject: Re: [RFC PATCH 0/2] SEPT SEAMCALL retry proposal

On Thu, 2024-11-21 at 19:51 +0800, Yan Zhao wrote:
> ==proposal details==
> 
> The proposal discusses SEPT-related and TLB-flush-related SEAMCALLs
> together which are required for page installation and uninstallation.
> 
> These SEAMCALLs can be divided into three groups:
> Group 1: tdh_mem_page_add().
>          The SEAMCALL is invoked only during TD build time and therefore
>          KVM has ensured that no contention will occur.
> 
>          Proposal: (as in patch 1)
>          Just return error when TDX_OPERAND_BUSY is found.
> 
> Group 2: tdh_mem_sept_add(), tdh_mem_page_aug().
>          These two SEAMCALLs are invoked for page installation. 
>          They return TDX_OPERAND_BUSY when contending with tdh_vp_enter()
> 	 (due to 0-step mitigation) or TDCALLs tdg_mem_page_accept(),
> 	 tdg_mem_page_attr_rd/wr().

We did verify with TDX module folks that the TDX module could be changed to not
take the sept host priority lock for zero entries (that happen during the guest
operations). In that case, I think we shouldn't expect contention for
tdh_mem_sept_add() and tdh_mem_page_aug() from them? We still need it for
tdh_vp_enter() though.


> 
>          Proposal: (as in patch 1)
>          - Return -EBUSY in KVM for TDX_OPERAND_BUSY to cause RET_PF_RETRY
>            to be returned in kvm_mmu_do_page_fault()/kvm_mmu_page_fault().
>          
>          - Inside TDX's EPT violation handler, retry on RET_PF_RETRY as
>            long as there are no pending signals/interrupts.
> 
>          The retry inside TDX aims to reduce the count of tdh_vp_enter()
>          before resolving EPT violations in the local vCPU, thereby
>          minimizing contentions with other vCPUs. However, it can't
>          completely eliminate 0-step mitigation as it exits when there're
>          pending signals/interrupts and does not (and cannot) remove the
>          tdh_vp_enter() caused by KVM_EXIT_MEMORY_FAULT.
> 
>          Resources    SHARED  users      EXCLUSIVE users
>          ------------------------------------------------------------
>          SEPT tree  tdh_mem_sept_add     tdh_vp_enter(0-step mitigation)
>                     tdh_mem_page_aug
>          ------------------------------------------------------------
>          SEPT entry                      tdh_mem_sept_add (Host lock)
>                                          tdh_mem_page_aug (Host lock)
>                                          tdg_mem_page_accept (Guest lock)
>                                          tdg_mem_page_attr_rd (Guest lock)
>                                          tdg_mem_page_attr_wr (Guest lock)
> 
> Group 3: tdh_mem_range_block(), tdh_mem_track(), tdh_mem_page_remove().
>          These three SEAMCALLs are invoked for page uninstallation, with
>          KVM mmu_lock held for writing.
> 
>          Resources     SHARED users      EXCLUSIVE users
>          ------------------------------------------------------------
>          TDCS epoch    tdh_vp_enter      tdh_mem_track
>          ------------------------------------------------------------
>          SEPT tree  tdh_mem_page_remove  tdh_vp_enter (0-step mitigation)
>                                          tdh_mem_range_block   
>          ------------------------------------------------------------
>          SEPT entry                      tdh_mem_range_block (Host lock)
>                                          tdh_mem_page_remove (Host lock)
>                                          tdg_mem_page_accept (Guest lock)
>                                          tdg_mem_page_attr_rd (Guest lock)
>                                          tdg_mem_page_attr_wr (Guest lock)
> 
>          Proposal: (as in patch 2)
>          - Upon detection of TDX_OPERAND_BUSY, retry each SEAMCALL only
>            once.
>          - During the retry, kick off all vCPUs and prevent any vCPU from
>            entering to avoid potential contentions.
> 
>          This is because tdh_vp_enter() and TDCALLs are not protected by
>          KVM mmu_lock, and remove_external_spte() in KVM must not fail.

The solution for group 3 actually doesn't look too bad at all to me. At least
from code and complexity wise. It's pretty compact, doesn't add any locks, and
limited to the tdx.c code. Although, I didn't evaluate the implementation 
correctness of tdx_no_vcpus_enter_start() and tdx_no_vcpus_enter_stop() yet.

Were you able to test the fallback path at all? Can we think of any real
situations where it could be burdensome?

One other thing to note I think, is that group 3 are part of no-fail operations.
The core KVM calling code doesn't have the understanding of a failure there. So
in this scheme of not avoiding contention we have to succeed before returning,
where group 1 and 2 can fail, so don't need the special fallback scheme.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ