lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z0VpsYpcz4uqdA7o@yzhao56-desk.sh.intel.com>
Date: Tue, 26 Nov 2024 14:24:49 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>, "x86@...nel.org"
	<x86@...nel.org>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>, "Chatre,
 Reinette" <reinette.chatre@...el.com>, "Hunter, Adrian"
	<adrian.hunter@...el.com>, "Lindgren, Tony" <tony.lindgren@...el.com>,
	"dmatlack@...gle.com" <dmatlack@...gle.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"Huang, Kai" <kai.huang@...el.com>, "nik.borisov@...e.com"
	<nik.borisov@...e.com>
Subject: Re: [RFC PATCH 0/2] SEPT SEAMCALL retry proposal

On Tue, Nov 26, 2024 at 08:46:51AM +0800, Edgecombe, Rick P wrote:
> On Thu, 2024-11-21 at 19:51 +0800, Yan Zhao wrote:
> > ==proposal details==
> > 
> > The proposal discusses SEPT-related and TLB-flush-related SEAMCALLs
> > together which are required for page installation and uninstallation.
> > 
> > These SEAMCALLs can be divided into three groups:
> > Group 1: tdh_mem_page_add().
> >          The SEAMCALL is invoked only during TD build time and therefore
> >          KVM has ensured that no contention will occur.
> > 
> >          Proposal: (as in patch 1)
> >          Just return error when TDX_OPERAND_BUSY is found.
> > 
> > Group 2: tdh_mem_sept_add(), tdh_mem_page_aug().
> >          These two SEAMCALLs are invoked for page installation. 
> >          They return TDX_OPERAND_BUSY when contending with tdh_vp_enter()
> > 	 (due to 0-step mitigation) or TDCALLs tdg_mem_page_accept(),
> > 	 tdg_mem_page_attr_rd/wr().
> 
> We did verify with TDX module folks that the TDX module could be changed to not
> take the sept host priority lock for zero entries (that happen during the guest
> operations). In that case, I think we shouldn't expect contention for
> tdh_mem_sept_add() and tdh_mem_page_aug() from them? We still need it for
> tdh_vp_enter() though.
Ah, you are right.

I previously incorrectly thought TDX module will avoid locking free entries for
tdg_mem_page_accept() only.


> 
> 
> > 
> >          Proposal: (as in patch 1)
> >          - Return -EBUSY in KVM for TDX_OPERAND_BUSY to cause RET_PF_RETRY
> >            to be returned in kvm_mmu_do_page_fault()/kvm_mmu_page_fault().
> >          
> >          - Inside TDX's EPT violation handler, retry on RET_PF_RETRY as
> >            long as there are no pending signals/interrupts.
> > 
> >          The retry inside TDX aims to reduce the count of tdh_vp_enter()
> >          before resolving EPT violations in the local vCPU, thereby
> >          minimizing contentions with other vCPUs. However, it can't
> >          completely eliminate 0-step mitigation as it exits when there're
> >          pending signals/interrupts and does not (and cannot) remove the
> >          tdh_vp_enter() caused by KVM_EXIT_MEMORY_FAULT.
> > 
> >          Resources    SHARED  users      EXCLUSIVE users
> >          ------------------------------------------------------------
> >          SEPT tree  tdh_mem_sept_add     tdh_vp_enter(0-step mitigation)
> >                     tdh_mem_page_aug
> >          ------------------------------------------------------------
> >          SEPT entry                      tdh_mem_sept_add (Host lock)
> >                                          tdh_mem_page_aug (Host lock)
> >                                          tdg_mem_page_accept (Guest lock)
> >                                          tdg_mem_page_attr_rd (Guest lock)
> >                                          tdg_mem_page_attr_wr (Guest lock)
> > 
> > Group 3: tdh_mem_range_block(), tdh_mem_track(), tdh_mem_page_remove().
> >          These three SEAMCALLs are invoked for page uninstallation, with
> >          KVM mmu_lock held for writing.
> > 
> >          Resources     SHARED users      EXCLUSIVE users
> >          ------------------------------------------------------------
> >          TDCS epoch    tdh_vp_enter      tdh_mem_track
> >          ------------------------------------------------------------
> >          SEPT tree  tdh_mem_page_remove  tdh_vp_enter (0-step mitigation)
> >                                          tdh_mem_range_block   
> >          ------------------------------------------------------------
> >          SEPT entry                      tdh_mem_range_block (Host lock)
> >                                          tdh_mem_page_remove (Host lock)
> >                                          tdg_mem_page_accept (Guest lock)
> >                                          tdg_mem_page_attr_rd (Guest lock)
> >                                          tdg_mem_page_attr_wr (Guest lock)
> > 
> >          Proposal: (as in patch 2)
> >          - Upon detection of TDX_OPERAND_BUSY, retry each SEAMCALL only
> >            once.
> >          - During the retry, kick off all vCPUs and prevent any vCPU from
> >            entering to avoid potential contentions.
> > 
> >          This is because tdh_vp_enter() and TDCALLs are not protected by
> >          KVM mmu_lock, and remove_external_spte() in KVM must not fail.
> 
> The solution for group 3 actually doesn't look too bad at all to me. At least
> from code and complexity wise. It's pretty compact, doesn't add any locks, and
> limited to the tdx.c code. Although, I didn't evaluate the implementation 
> correctness of tdx_no_vcpus_enter_start() and tdx_no_vcpus_enter_stop() yet.
> 
> Were you able to test the fallback path at all? Can we think of any real
> situations where it could be burdensome?
Yes, I did some negative tests to fail block/track/remove to check if the
tdx_no_vcpus_enter*() work.

Even without those negative tests, it's not rare for tdh_mem_track() to return
busy due to its contention with tdh_vp_enter().

Given that normally it's not frequent to find tdh_mem_range_block() or
tdh_mem_page_remove() to return busy (if we reduce the frequency of zero-step
mitigation) and that tdh_mem_track() will kick off all vCPUs later any way, I
think it's acceptable to do the tdx_no_vcpus_enter*() stuffs in the page removal
path.


> One other thing to note I think, is that group 3 are part of no-fail operations.
> The core KVM calling code doesn't have the understanding of a failure there. So
> in this scheme of not avoiding contention we have to succeed before returning,
> where group 1 and 2 can fail, so don't need the special fallback scheme.
Yes, exactly.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ