lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZuP5eNXFCljzRgWo@yzhao56-desk.sh.intel.com>
Date: Fri, 13 Sep 2024 16:36:08 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Rick P Edgecombe <rick.p.edgecombe@...el.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, Yuan Yao <yuan.yao@...el.com>, Kai Huang
	<kai.huang@...el.com>, "isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "dmatlack@...gle.com"
	<dmatlack@...gle.com>, "nik.borisov@...e.com" <nik.borisov@...e.com>
Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY
 with operand SEPT

This is a lock status report of TDX module for current SEAMCALL retry issue
based on code in TDX module public repo https://github.com/intel/tdx-module.git
branch TDX_1.5.05.

TL;DR:
- tdh_mem_track() can contend with tdh_vp_enter().
- tdh_vp_enter() contends with tdh_mem*() when 0-stepping is suspected.
- tdg_mem_page_accept() can contend with other tdh_mem*().

Proposal:
- Return -EAGAIN directly in ops link_external_spt/set_external_spte when
  tdh_mem_sept_add()/tdh_mem_page_aug() returns BUSY.
- Kick off vCPUs at the beginning of page removal path, i.e. before the
  tdh_mem_range_block().
  Set a flag and disallow tdh_vp_enter() until tdh_mem_page_remove() is done.
  (one possible optimization:
   since contention from tdh_vp_enter()/tdg_mem_page_accept should be rare,
   do not kick off vCPUs in normal conditions.
   When SEAMCALL BUSY happens, retry for once, kick off vCPUs and do not allow
   TD enter until page removal completes.)

Below are detailed analysis:

=== Background ===
In TDX module, there are 4 kinds of locks:
1. sharex_lock:
   Normal read/write lock. (no host priority stuff)

2. sharex_hp_lock:
   Just like normal read/write lock, except that host can set host priority bit
   on failure.
   when guest tries to acquire the lock and sees host priority bit set, it will
   return "busy host priority" directly, letting host win.
   After host acquires the lock successfully, host priority bit is cleared.

3. sept entry lock:
   Lock utilizing software bits in SEPT entry.
   HP bit (Host priority): bit 52 
   EL bit (Entry lock): bit 11, used as a bit lock.

   - host sets HP bit when host fails to acquire EL bit lock;
   - host resets HP bit when host wins.
   - guest returns "busy host priority" if HP bit is found set when guest tries
     to acquire EL bit lock.

4. mutex lock:
   Lock with only 2 states: free, lock.
   (not the same as linux mutex, not re-scheduled, could pause() for debugging).

===Resources & users list===

Resources              SHARED  users              EXCLUSIVE users
------------------------------------------------------------------------
(1) TDR                tdh_mng_rdwr               tdh_mng_create
                       tdh_vp_create              tdh_mng_add_cx
                       tdh_vp_addcx               tdh_mng_init
		       tdh_vp_init                tdh_mng_vpflushdone
                       tdh_vp_enter               tdh_mng_key_config 
                       tdh_vp_flush               tdh_mng_key_freeid
                       tdh_vp_rd_wr               tdh_mr_extend
                       tdh_mem_sept_add           tdh_mr_finalize
                       tdh_mem_sept_remove        tdh_vp_init_apicid
                       tdh_mem_page_aug           tdh_mem_page_add
                       tdh_mem_page_remove
                       tdh_mem_range_block
                       tdh_mem_track
                       tdh_mem_range_unblock
                       tdh_phymem_page_reclaim
------------------------------------------------------------------------
(2) KOT                tdh_phymem_cache_wb        tdh_mng_create
                                                  tdh_mng_vpflushdone
                                                  tdh_mng_key_freeid
------------------------------------------------------------------------
(3) TDCS               tdh_mng_rdwr
                       tdh_vp_create
                       tdh_vp_addcx
                       tdh_vp_init
                       tdh_vp_init_apicid
                       tdh_vp_enter 
                       tdh_vp_rd_wr
                       tdh_mem_sept_add
                       tdh_mem_sept_remove
                       tdh_mem_page_aug
                       tdh_mem_page_remove
                       tdh_mem_range_block
                       tdh_mem_track
                       tdh_mem_range_unblock
------------------------------------------------------------------------
(4) TDVPR              tdh_vp_rd_wr                tdh_vp_create
                                                   tdh_vp_addcx
                                                   tdh_vp_init
                                                   tdh_vp_init_apicid
                                                   tdh_vp_enter 
                                                   tdh_vp_flush 
------------------------------------------------------------------------
(5) TDCS epoch         tdh_vp_enter                tdh_mem_track
------------------------------------------------------------------------
(6) secure_ept_lock    tdh_mem_sept_add            tdh_vp_enter
                       tdh_mem_page_aug            tdh_mem_sept_remove
                       tdh_mem_page_remove         tdh_mem_range_block
                                                   tdh_mem_range_unblock
------------------------------------------------------------------------
(7) SEPT entry                                     tdh_mem_sept_add
                                                   tdh_mem_sept_remove
                                                   tdh_mem_page_aug
                                                   tdh_mem_page_remove
                                                   tdh_mem_range_block
                                                   tdh_mem_range_unblock
                                                   tdg_mem_page_accept

Current KVM interested SEAMCALLs:
------------------------------------------------------------------------
  SEAMCALL                Lock Name        Lock Type        Resource       
tdh_mng_create          sharex_hp_lock     EXCLUSIVE        TDR
                        sharex_lock        EXCLUSIVE        KOT

tdh_mng_add_cx          sharex_hp_lock     EXCLUSIVE        TDR
                        sharex_hp_lock     EXCLUSIVE        page to add

tdh_mng_init            sharex_hp_lock     EXCLUSIVE        TDR
                        sharex_hp_lock     NO_LOCK          TDCS

tdh_mng_vpflushdone     sharex_hp_lock     EXCLUSIVE        TDR
                        sharex_lock        EXCLUSIVE        KOT

tdh_mng_key_config      sharex_hp_lock     EXCLUSIVE        TDR

tdh_mng_key_freeid      sharex_hp_lock     EXCLUSIVE        TDR
                        sharex_lock        EXCLUSIVE        KOT

tdh_mng_rdwr            sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS

tdh_mr_extend           sharex_hp_lock     EXCLUSIVE        TDR
                        sharex_hp_lock     NO_LOCK          TDCS

tdh_mr_finalize         sharex_hp_lock     EXCLUSIVE        TDR
                        sharex_hp_lock     NO_LOCK          TDCS

tdh_vp_create           sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS
                        sharex_hp_lock     EXCLUSIVE        TDVPR

tdh_vp_addcx            sharex_hp_lock     EXCLUSIVE        TDVPR
                        sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS
                        sharex_hp_lock     EXCLUSIVE        page to add

tdh_vp_init             sharex_hp_lock     EXCLUSIVE        TDVPR
                        sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS

tdh_vp_init_apicid      sharex_hp_lock     EXCLUSIVE        TDVPR
                        sharex_hp_lock     EXCLUSIVE        TDR
                        sharex_hp_lock     SHARED           TDCS

tdh_vp_enter(*)         sharex_hp_lock     EXCLUSIVE        TDVPR
                        sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS
                        sharex_lock        SHARED           TDCS epoch_lock
                        sharex_lock        EXCLUSIVE        TDCS secure_ept_lock

tdh_vp_flush            sharex_hp_lock     EXCLUSIVE        TDVPR
                        sharex_hp_lock     SHARED           TDR

tdh_vp_rd_wr            sharex_hp_lock     SHARED           TDVPR
                        sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS

tdh_mem_sept_add        sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS
                        sharex_lock        SHARED           TDCS secure_ept_lock
                        sept entry lock    HOST,EXCLUSIVE   SEPT entry to modify
                        sharex_hp_lock     EXCLUSIVE        page to add

tdh_mem_sept_remove     sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS
                        sharex_lock        EXCLUSIVE        TDCS secure_ept_lock
                        sept entry lock    HOST,EXCLUSIVE   SEPT entry to modify
                        sharex_hp_lock     EXCLUSIVE        page to remove 

tdh_mem_page_add        sharex_hp_lock     EXCLUSIVE        TDR
                        sharex_hp_lock     NO_LOCK          TDCS
                        sharex_lock        NO_LOCK          TDCS secure_ept_lock
                        sharex_hp_lock     EXCLUSIVE        page to add

tdh_mem_page_aug        sharex_hp_lock     SHARED           TDR 
                        sharex_hp_lock     SHARED           TDCS
                        sharex_lock        SHARED           TDCS secure_ept_lock 
                        sept entry lock    HOST,EXCLUSIVE   SEPT entry to modify
                        sharex_hp_lock     EXCLUSIVE        page to aug

tdh_mem_page_remove     sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS
                        sharex_lock        SHARED           TDCS secure_ept_lock
                        sept entry lock    HOST,EXCLUSIVE   SEPT entry to modify
                        sharex_hp_lock     EXCLUSIVE        page to remove

tdh_mem_range_block     sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS
                        sharex_lock        EXCLUSIVE        TDCS secure_ept_lock
                        sept entry lock    HOST,EXCLUSIVE   SEPT entry to modify

tdh_mem_track           sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS
                        sharex_lock        EXCLUSIVE        TDCS epoch_lock

tdh_mem_range_unblock   sharex_hp_lock     SHARED           TDR
                        sharex_hp_lock     SHARED           TDCS
                        sharex_lock        EXCLUSIVE        TDCS secure_ept_lock
                        sept entry lock    HOST,EXCLUSIVE   SEPT entry to modify

tdh_phymem_page_reclaim sharex_hp_lock     EXCLUSIVE        page to reclaim
                        sharex_hp_lock     SHARED           TDR

tdh_phymem_cache_wb     mutex_lock                       per package wbt_entries 
                        sharex_lock        SHARED           KOT

tdh_phymem_page_wbinvd  sharex_hp_lock     SHARED           page to be wbinvd


Current KVM interested TDCALLs:
------------------------------------------------------------------------
tdg_mem_page_accept     sept entry lock    GUEST            SEPT entry to modify

TDCALLs like tdg_mr_rtmr_extend(), tdg_servtd_rd_wr(), tdg_mem_page_attr_wr()
tdg_mem_page_attr_rd() are not included.

*:(a) tdh_vp_enter() holds shared TDR lock and exclusive TDVPR lock, the two
      locks are released when exiting to VMM.
  (b) tdh_vp_enter() holds shared TDCS lock and shared TDCS epoch_lock lock,
      releases them before entering non-root mode.
  (c) tdh_vp_enter() holds shared epoch lock, contending with tdh_mem_track(). 
  (d) tdh_vp_enter() only holds EXCLUSIVE secure_ept_lock when 0-stepping is
      suspected, i.e. when last_epf_gpa_list is not empty.
      When a EPT violation happens, TDX module checks if the guest RIP equals
      to the guest RIP of last TD entry. Only when this is true for 6 continuous
      times, the gpa will be recorded in last_epf_gpa_list. The list will be
      reset once guest RIP of a EPT violation and last TD enter RIP are unequal.


=== Summary ===
For the 8 kinds of common resources protected in TDX module:

(1) TDR:
    There are only shared accesses to TDR during runtime (i.e. after TD is
    finalized and before TD tearing down), if we don't need to support calling
    tdh_vp_init_apicid() at runtime (e.g. for vCPU hotplug).
    tdh_vp_enter() holds shared TDR lock until exiting to VMM.
    TDCALLs do not acquire the TDR lock.

(2) KOT (Key Ownership Table)
    Current KVM code should have avoided contention to this resource.

(3) TDCS:
    Shared accessed or access with no lock when TDR is exclusively locked.
    Seamcalls in runtime (after TD finalized and before TD tearing down) do not
    contend with each other on TDCS.
    tdh_vp_enter() holds shared TDCS lock and releases it before entering
    non-root mode.
    Current TDCALLs for basic TDX do not acquire this lock.

(4) TDVPR:
    Per-vCPU exclusive accessed except for tdh_vp_rd_wr().
    tdh_vp_enter() holds exclusive TDVPR lock until exiting to VMM.
    TDCALLs do not acquire the TDVPR lock.

(5) TDCS epoch:
    tdh_mem_track() requests exclusive access, and tdh_vp_enter() requests
    shared access.
    tdh_mem_track() can contend with tdh_vp_enter().

(6) SEPT tree:
    Protected by secure_ept_lock (sharex_lock).
    tdh_mem_sept_add()/tdh_mem_page_aug()/tdh_mem_page_remove() holds shared
    lock; tdh_mem_sept_remove()/tdh_mem_range_block()/tdh_mem_range_unblock()
    holds exclusive lock.
    tdh_vp_enter() requests exclusive access when 0-stepping is suspected,
    contending with all other tdh_mem*().
    Guest does not acquire this lock.

    So,
    kvm mmu_lock has prevented contentions between
    tdh_mem_sept_add()/tdh_mem_page_aug()/tdh_mem_page_remove() and
    tdh_mem_sept_remove()/tdh_mem_range_block().
    Though tdh_mem_sept_add()/tdh_mem_page_aug() races with tdh_vp_enter(),
    returning -EAGAIN directly is fine for them.
    The remaining issue is the contention between tdh_vp_enter() and
    tdh_mem_page_remove()/tdh_mem_sept_remove()/tdh_mem_range_block().

(7) SEPT entry:
    All exclusive access.
    tdg_mem_page_accept() may contend with other tdh_mem*() on a specific SEPT
    entry.

(8) PAMT entry for target pages (e.g. page to add/aug/remove/reclaim/wbinvd):
    Though they are all exclusively locked, no race should be met as long as
    they belong to different pamt entries.

Conclusion:
Current KVM code should have avoided contentions of resources (1)-(4),(8), while
(5),(6),(7) are still possible to meet contention.
- tdh_mem_track() can contend with tdh_vp_enter() for (5)
- tdh_vp_enter() contends with tdh_mem*() for (6) when 0-stepping is suspected.
- tdg_mem_page_accept() can contend with other tdh_mem*() for (7).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ