[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <735d3a560046e4a7a9f223dc5688dcf1730280c5.camel@intel.com>
Date: Mon, 25 Nov 2024 19:50:23 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>, "binbin.wu@...ux.intel.com"
<binbin.wu@...ux.intel.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Li, Xiaoyao"
<xiaoyao.li@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "Hunter, Adrian"
<adrian.hunter@...el.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "Chatre, Reinette"
<reinette.chatre@...el.com>, "Yang, Weijiang" <weijiang.yang@...el.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>, "Yamahata, Isaku"
<isaku.yamahata@...el.com>, "dmatlack@...gle.com" <dmatlack@...gle.com>,
"nik.borisov@...e.com" <nik.borisov@...e.com>,
"tony.lindgren@...ux.intel.com" <tony.lindgren@...ux.intel.com>, "Gao, Chao"
<chao.gao@...el.com>, "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 0/7] KVM: TDX: TD vcpu enter/exit
On Mon, 2024-11-25 at 07:19 -0800, Sean Christopherson wrote:
> On Mon, Nov 25, 2024, Binbin Wu wrote:
> > On 11/22/2024 4:14 AM, Adrian Hunter wrote:
> > [...]
> > > - tdx_vcpu_enter_exit() calls guest_state_enter_irqoff()
> > > and guest_state_exit_irqoff() which comments say should be
> > > called from non-instrumentable code but noinst was removed
> > > at Sean's suggestion:
> > > https://lore.kernel.org/all/Zg8tJspL9uBmMZFO@google.com/
> > > noinstr is also needed to retain NMI-blocking by avoiding
> > > instrumented code that leads to an IRET which unblocks NMIs.
> > > A later patch set will deal with NMI VM-exits.
> > >
> > In https://lore.kernel.org/all/Zg8tJspL9uBmMZFO@google.com, Sean mentioned:
> > "The reason the VM-Enter flows for VMX and SVM need to be noinstr is they do things
> > like load the guest's CR2, and handle NMI VM-Exits with NMIs blocks. None of
> > that applies to TDX. Either that, or there are some massive bugs lurking due to
> > missing code."
> >
> > I don't understand why handle NMI VM-Exits with NMIs blocks doesn't apply to
> > TDX. IIUIC, similar to VMX, TDX also needs to handle the NMI VM-exit in the
> > noinstr section to avoid the unblock of NMIs due to instrumentation-induced
> > fault.
>
> With TDX, SEAMCALL is mechnically a VM-Exit. KVM is the "guest" running in VMX
> root mode, and the TDX-Module is the "host", running in SEAM root mode.
>
> And for TDH.VP.ENTER, if a hardware NMI arrives with the TDX guest is active,
> the initial NMI VM-Exit, which consumes the NMI and blocks further NMIs, goes
> from SEAM non-root to SEAM root. The SEAMRET from SEAM root to VMX root (KVM)
> is effectively a VM-Enter, and does NOT block NMIs in VMX root (at least, AFAIK).
>
> So trying to handle the NMI "exit" in a noinstr section is pointless because NMIs
> are never blocked.
I thought NMI remains being blocked after SEAMRET?
The TDX CPU architecture extension spec says:
"
On transition to SEAM VMX root operation, the processor can inhibit NMI and SMI.
While inhibited, if these events occur, then they are tailored to stay pending
and be delivered when the inhibit state is removed. NMI and external interrupts
can be uninhibited in SEAM VMX-root operation. In SEAM VMX-root operation,
MSR_INTR_PENDING can be read to help determine status of any pending events.
On transition to SEAM VMX non-root operation using a VM entry, NMI and INTR
inhibit states are, by design, updated based on the configuration of the TD VMCS
used to perform the VM entry.
...
On transition to legacy VMX root operation using SEAMRET, the NMI and SMI
inhibit state can be restored to the inhibit state at the time of the previous
SEAMCALL and any pending NMI/SMI would be delivered if not inhibited.
"
Here NMI is inhibited in SEAM VMX root, but is never inhibited in VMX root.
And the last paragraph does say "any pending NMI would be delivered if not
inhibited".
But I thought this applies to the case when "NMI happens in SEAM VMX root", but
not "NMI happens in SEAM VMX non-root"? I thought the NMI is already
"delivered" when CPU is in "SEAM VMX non-root", but I guess I was wrong here..
>
> TDX is also different because KVM isn't responsible for context switching guest
> state. Specifically, CR2 is managed by the TDX Module, and so there is no window
> where KVM runs with guest CR2, and thus there is no risk of clobbering guest CR2
> with a host value, e.g. due to take a #PF due instrumentation triggering something.
>
> All that said, I did forget that code that runs between guest_state_enter_irqoff()
> and guest_state_exit_irqoff() can't be instrumeneted. And at least as of patch 2
> in this series, the simplest way to make that happen is to tag tdx_vcpu_enter_exit()
> as noinstr. Just please make sure nothing else is added in the noinstr section
> unless it absolutely needs to be there.
If NMI is not a concern, is below also an option?
guest_state_enter_irqoff();
instructmentation_begin();
tdh_vp_enter();
instructmentation_end();
guest_state_exit_irqoff();
Powered by blists - more mailing lists