[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38f506575caacd5488f73315b231c3282f893d46.camel@intel.com>
Date: Thu, 19 Jan 2023 23:08:41 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "Christopherson,, Sean" <seanjc@...gle.com>
CC: "dmatlack@...gle.com" <dmatlack@...gle.com>,
"sean.j.christopherson@...el.com" <sean.j.christopherson@...el.com>,
"Shahar, Sagi" <sagis@...gle.com>,
"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
"Aktas, Erdem" <erdemaktas@...gle.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"zhi.wang.linux@...il.com" <zhi.wang.linux@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Yamahata, Isaku" <isaku.yamahata@...el.com>
Subject: Re: [PATCH v11 018/113] KVM: TDX: create/destroy VM structure
On Thu, 2023-01-19 at 21:36 +0000, Sean Christopherson wrote:
> On Thu, Jan 19, 2023, Huang, Kai wrote:
> > On Thu, 2023-01-19 at 15:37 +0000, Sean Christopherson wrote:
> > > On Thu, Jan 19, 2023, Huang, Kai wrote:
> > > > On Tue, 2023-01-17 at 21:01 +0000, Sean Christopherson wrote:
> > > > > On Tue, Jan 17, 2023, Sean Christopherson wrote:
> > > > > > On Tue, Jan 17, 2023, Zhi Wang wrote:
> > > > > Oh, the other important piece I forgot to mention is that dropping mmu_lock deep
> > > > > in KVM's MMU in order to wait isn't always an option. Most flows would play nice
> > > > > with dropping mmu_lock and sleeping, but some paths, e.g. from the mmu_notifier,
> > > > > (conditionally) disallow sleeping.
> > > >
> > > > Could we do something similar to tdp_mmu_iter_cond_resched() but not simple busy
> > > > retrying "X times", at least at those paths that can release mmu_lock()?
> > >
> > > That's effectively what happens by unwinding up the stak with an error code.
> > > Eventually the page fault handler will get the error and retry the guest.
> > >
> > > > Basically we treat TDX_OPERAND_BUSY as seamcall_needbreak(), similar to
> > > > rwlock_needbreak(). I haven't thought about details though.
> > >
> > > I am strongly opposed to that approach. I do not want to pollute KVM's MMU code
> > > with a bunch of retry logic and error handling just because the TDX module is
> > > ultra paranoid and hostile to hypervisors.
> >
> > Right. But IIUC there's legal cases that SEPT SEAMCALL can return BUSY due to
> > multiple threads trying to read/modify SEPT simultaneously in case of TDP MMU.
> > For instance, parallel page faults on different vcpus on private pages. I
> > believe this is the main reason to retry.
>
> Um, crud. I think there's a bigger issue. KVM always operates on its copy of the
> S-EPT tables and assumes the the real S-EPT tables will always be synchronized with
> KVM's mirror. That assumption doesn't hold true without serializing SEAMCALLs in
> some way. E.g. if a SPTE is zapped and mapped at the same time, we can end up with:
>
> vCPU0 vCPU1
> ===== =====
> mirror[x] = xyz
> old_spte = mirror[x]
> mirror[x] = REMOVED_SPTE
> sept[x] = REMOVED_SPTE
> sept[x] = xyz
IIUC this case cannot happen, as the two steps in the vcpu0 are within read
lock, which prevents from vcpu1, which holds the write lock during zapping SPTE.
Powered by blists - more mailing lists