linux-kernel - Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4bed8c0579298fbb0767c04b75cc9c3be0e925ad.camel@intel.com>
Date: Fri, 13 Sep 2024 19:19:05 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>, "Zhao, Yan Y"
	<yan.y.zhao@...el.com>
CC: "Yao, Yuan" <yuan.yao@...el.com>, "Huang, Kai" <kai.huang@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "dmatlack@...gle.com" <dmatlack@...gle.com>,
	"nik.borisov@...e.com" <nik.borisov@...e.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>
Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with
 operand SEPT

On Fri, 2024-09-13 at 10:23 -0700, Sean Christopherson wrote:
> > TL;DR:
> > - tdh_mem_track() can contend with tdh_vp_enter().
> > - tdh_vp_enter() contends with tdh_mem*() when 0-stepping is suspected.
> 
> The zero-step logic seems to be the most problematic.  E.g. if KVM is trying
> to
> install a page on behalf of two vCPUs, and KVM resumes the guest if it
> encounters
> a FROZEN_SPTE when building the non-leaf SPTEs, then one of the vCPUs could
> trigger the zero-step mitigation if the vCPU that "wins" and gets delayed for
> whatever reason.

Can you explain more about what the concern is here? That the zero-step
mitigation activation will be a drag on the TD because of extra contention with
the TDH.MEM calls?

> 
> Since FROZEN_SPTE is essentially bit-spinlock with a reaaaaaly slow slow-path,
> what if instead of resuming the guest if a page fault hits FROZEN_SPTE, KVM
> retries
> the fault "locally", i.e. _without_ redoing tdh_vp_enter() to see if the vCPU
> still
> hits the fault?

It seems like an optimization. To me, I would normally want to know how much it
helped before adding it. But if you think it's an obvious win I'll defer.

> 
> For non-TDX, resuming the guest and letting the vCPU retry the instruction is
> desirable because in many cases, the winning task will install a valid mapping
> before KVM can re-run the vCPU, i.e. the fault will be fixed before the
> instruction is re-executed.  In the happy case, that provides optimal
> performance
> as KVM doesn't introduce any extra delay/latency.
> 
> But for TDX, the math is different as the cost of a re-hitting a fault is
> much,
> much higher, especially in light of the zero-step issues.
> 
> E.g. if the TDP MMU returns a unique error code for the frozen case, and
> kvm_mmu_page_fault() is modified to return the raw return code instead of '1',
> then the TDX EPT violation path can safely retry locally, similar to the do-
> while
> loop in kvm_tdp_map_page().
> 
> The only part I don't like about this idea is having two "retry" return
> values,
> which creates the potential for bugs due to checking one but not the other.
> 
> Hmm, that could be avoided by passing a bool pointer as an out-param to
> communicate
> to the TDX S-EPT fault handler that the SPTE is frozen.  I think I like that
> option better even though the out-param is a bit gross, because it makes it
> more
> obvious that the "frozen_spte" is a special case that doesn't need attention
> for
> most paths.