[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ed6ccd719241ef6df1558b69ec81073a3b3cf77c.camel@intel.com>
Date: Mon, 14 Oct 2024 17:36:48 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>, "Huang, Kai"
<kai.huang@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Yao, Yuan"
<yuan.yao@...el.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
"nik.borisov@...e.com" <nik.borisov@...e.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "isaku.yamahata@...il.com"
<isaku.yamahata@...il.com>, "dmatlack@...gle.com" <dmatlack@...gle.com>
Subject: Re: [PATCH 09/21] KVM: TDX: Retry seamcall when TDX_OPERAND_BUSY with
operand SEPT
On Mon, 2024-10-14 at 10:54 +0000, Huang, Kai wrote:
> On Thu, 2024-10-10 at 21:53 +0000, Edgecombe, Rick P wrote:
> > On Thu, 2024-10-10 at 10:33 -0700, Sean Christopherson wrote:
> > > >
> > > > 1st: "fault->is_private != kvm_mem_is_private(kvm, fault->gfn)" is found.
> > > > 2nd-6th: try_cmpxchg64() fails on each level SPTEs (5 levels in total)
> >
> > Isn't there a more general scenario:
> >
> > vcpu0 vcpu1
> > 1. Freezes PTE
> > 2. External op to do the SEAMCALL
> > 3. Faults same PTE, hits frozen PTE
> > 4. Retries N times, triggers zero-step
> > 5. Finally finishes external op
> >
> > Am I missing something?
>
> I must be missing something. I thought KVM is going to
>
"Is going to", as in "will be changed to"? Or "does today"?
> retry internally for
> step 4 (retries N times) because it sees the frozen PTE, but will never go back
> to guest after the fault is resolved? How can step 4 triggers zero-step?
Step 3-4 is saying it will go back to the guest and fault again.
As far as what KVM will do in the future, I think it is still open. I've not had
the chance to think about this for more than 30 min at a time, but the plan to
handle OPERAND_BUSY by taking an expensive path to break any contention (i.e.
kick+lock + whatever TDX module changes we come up with) seems to the leading
idea.
Retry N times is too hacky. Retry internally forever might be awkward to
implement. Because of the signal_pending() check, you would have to handle
exiting to userspace and going back to an EPT violation next time the vcpu tries
to enter.
Powered by blists - more mailing lists