lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8618bce9-8c76-4048-8264-dfd6afc82bc6@intel.com>
Date: Wed, 11 Sep 2024 13:17:54 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>, Rick P Edgecombe
	<rick.p.edgecombe@...el.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Yuan Yao
	<yuan.yao@...el.com>, "isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
	Yan Y Zhao <yan.y.zhao@...el.com>, "dmatlack@...gle.com"
	<dmatlack@...gle.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"nik.borisov@...e.com" <nik.borisov@...e.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>
Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with
 operand SEPT


>> Host-Side (SEAMCALL) Operation
>> ------------------------------
>> The host VMM is expected to retry host-side operations that fail with a
>> TDX_OPERAND_BUSY status. The host priority mechanism helps guarantee that at
>> most after a limited time (the longest guest-side TDX module flow) there will be
>> no contention with a guest TD attempting to acquire access to the same resource.
>>
>> Lock operations process the HOST_PRIORITY bit as follows:
>>     - A SEAMCALL (host-side) function that fails to acquire a lock sets the lock’s
>>     HOST_PRIORITY bit and returns a TDX_OPERAND_BUSY status to the host VMM. It is
>>     the host VMM’s responsibility to re-attempt the SEAMCALL function until is
>>     succeeds; otherwise, the HOST_PRIORITY bit remains set, preventing the guest TD
>>     from acquiring the lock.
>>     - A SEAMCALL (host-side) function that succeeds to acquire a lock clears the
>>     lock’s HOST_PRIORITY bit.
> 
> *sigh*
> 
>> Guest-Side (TDCALL) Operation
>> -----------------------------
>> A TDCALL (guest-side) function that attempt to acquire a lock fails if
>> HOST_PRIORITY is set to 1; a TDX_OPERAND_BUSY status is returned to the guest.
>> The guest is expected to retry the operation.
>>
>> Guest-side TDCALL flows that acquire a host priority lock have an upper bound on
>> the host-side latency for that lock; once a lock is acquired, the flow either
>> releases within a fixed upper time bound, or periodically monitor the
>> HOST_PRIORITY flag to see if the host is attempting to acquire the lock.
>> "
>>
>> So KVM can't fully prevent TDX_OPERAND_BUSY with KVM side locks, because it is
>> involved in sorting out contention between the guest as well. We need to double
>> check this, but I *think* this HOST_PRIORITY bit doesn't come into play for the
>> functionality we need to exercise for base support.
>>
>> The thing that makes me nervous about retry based solution is the potential for
>> some kind deadlock like pattern. Just to gather your opinion, if there was some
>> SEAMCALL contention that couldn't be locked around from KVM, but came with some
>> strong well described guarantees, would a retry loop be hard NAK still?
> 
> I don't know.  It would depend on what operations can hit BUSY, and what the
> alternatives are.  E.g. if we can narrow down the retry paths to a few select
> cases where it's (a) expected, (b) unavoidable, and (c) has minimal risk of
> deadlock, then maybe that's the least awful option.
> 
> What I don't think KVM should do is blindly retry N number of times, because
> then there are effectively no rules whatsoever.  E.g. if KVM is tearing down a
> VM then KVM should assert on immediate success.  And if KVM is handling a fault
> on behalf of a vCPU, then KVM can and should resume the guest and let it retry.
> Ugh, but that would likely trigger the annoying "zero-step mitigation" crap.
> 
> What does this actually mean in practice?  What's the threshold, 

FWIW, the limit in the public TDX module code is 6:

   #define STEPPING_EPF_THRESHOLD 6   // Threshold of confidence in 	
			detecting EPT fault-based stepping in progress

We might be able to change it to a larger value though but we need to 
understand why it is necessary.

> is the VM-Enter
> error uniquely identifiable, 

When zero-step mitigation is active in the module, TDH.VP.ENTER tries to 
grab the SEPT lock thus it can fail with SEPT BUSY error.  But if it 
does grab the lock successfully, it exits to VMM with EPT violation on 
that GPA immediately.

In other words, TDH.VP.ENTER returning SEPT BUSY means "zero-step 
mitigation" must have been active.  A normal EPT violation _COULD_ mean 
mitigation is already active, but AFAICT we don't have a way to tell 
that in the EPT violation.

> and can KVM rely on HOST_PRIORITY to be set if KVM
> runs afoul of the zero-step mitigation?

I think HOST_PRIORITY is always set if SEPT SEAMCALLs fails with BUSY.

> 
>    After a pre-determined number of such EPT violations occur on the same instruction,
>    the TDX module starts tracking the GPAs that caused Secure EPT faults and fails
>    further host VMM attempts to enter the TD VCPU unless previously faulting private
>    GPAs are properly mapped in the Secure EPT.
> 
> If HOST_PRIORITY is set, then one idea would be to resume the guest if there's
> SEPT contention on a fault, and then _if_ the zero-step mitigation is triggered,
> kick all vCPUs (via IPI) to ensure that the contended SEPT entry is unlocked and
> can't be re-locked by the guest.  That would allow KVM to guarantee forward
> progress without an arbitrary retry loop in the TDP MMU.

I think this should work.

It doesn't seem we can tell whether the zero step mitigation is active 
in EPT violation TDEXIT, or when SEPT SEAMCALL fails with SEPT BUSY. 
But when any SEPT SEAMCALL fails with SEPT BUSY, if we just kick all 
vCPUs and make them wait until the next retry is done (which must be 
successful otherwise it is illegal error), then this should handle both 
contention from guest and the zero-step mitigation.

> 
> Similarly, if KVM needs to zap a SPTE and hits BUSY, kick all vCPUs to ensure the
> one and only retry is guaranteed to succeed.

Yeah seems so.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ