[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN6PR1101MB216162F44664713802201FAFA80C9@BN6PR1101MB2161.namprd11.prod.outlook.com>
Date: Wed, 23 Nov 2022 08:31:52 +0000
From: "Li, Xin3" <xin3.li@...el.com>
To: "Christopherson,, Sean" <seanjc@...gle.com>
CC: Peter Zijlstra <peterz@...radead.org>,
Paolo Bonzini <pbonzini@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"hpa@...or.com" <hpa@...or.com>,
"Tian, Kevin" <kevin.tian@...el.com>
Subject: RE: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq()
for NMI/IRQ reinjection
> > > > > > > > And yes, the current code appears to suffer the same defect.
> > > >
> > > > That defect isn't going to be fixed simply by changing how KVM
> > > > forwards NMIs though. IIUC, _everything_ between VM-Exit and the
> > > > invocation of the NMI handler needs to be noinstr. On VM-Exit due
> > > > to NMI, NMIs are blocked. If a #BP/#DB/#PF occurs before KVM gets
> > > > to kvm_x86_handle_exit_irqoff(), the subsequent IRET will unblock
> > > > NMIs before the original NMI is serviced, i.e. a second NMI could
> > > > come in at anytime regardless of how KVM forwards the NMI to the
> kernel.
> > > >
> > > > Is there any way to solve this without tagging everything noinstr?
> > > > There is a metric shit ton of code between VM-Exit and the
> > > > handling of NMIs, and much of that code is common helpers. It
> > > > might be possible to hoist NMI handler much earlier, though we'd
> > > > need to do a super thorough audit to ensure all necessary host state is
> restored.
> > >
> > > As NMI is the only vector with this potential issue, it sounds a
> > > good idea to only promote its handling.
> > >
> >
> > Hi Peter/Sean,
> >
> > I prefer to move _everything_ between VM-Exit and the invocation of
> > the NMI handler into the noinstr section in the next patch set, how do you
> think?
>
> That's likely going to be beyond painful and will have a _lot_ of collateral
> damage in the sense that other paths will end up calling noinstr function just
> because of VMX. E.g. hw_breakpoint_restore(),
> fpu_sync_guest_vmexit_xfd_state(),
> kvm_get_apic_mode(), multiple static calls in KVM... the list goes on and on
> and on.
>
> The ongoing maintenance for that would also be quite painful.
This is very complicated and I'm lost after reading the code around it.
Maybe some comments would help to explain the steps and order after VM exit.
(of course some people know it quite well)
>
> Actually, SVM already enables NMIs far earlier, which means the probability of
> breaking something by moving NMI handling earlier is lower. Not zero, as I
> don't trust that SVM gets all the corner right, but definitely low.
>
> E.g. this should be doable
Do we need to recover *all* host states before switching to NMI stack and
handler? if not what is the minimal set? If we restore the minimal set and
do "int $2" as early as possible, is it a quick and dirty approach?
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> cea8c07f5229..2fec93f38960 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7249,6 +7249,8 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu
> *vcpu)
> if (unlikely(vmx->exit_reason.failed_vmentry))
> return EXIT_FASTPATH_NONE;
>
> + <handle NMIs here>
> +
> vmx->loaded_vmcs->launched = 1;
>
> vmx_recover_nmi_blocking(vmx);
>
>
> thouh we'd like want a fair bit of refactoring so that all of vmx_vcpu_run() and
> svm_vcpu_run() don't need to be noinstr.
This sounds reasonable to me, however from Documentation/core-api/entry.rst,
we do need it. Maybe we could argue guest is logically orthogonal to host,
and the CPU is borrowed to another OS... (which also kind of explains nested)
>
> Another wart that needs to be addressed is trace_kvm_exit(). IIRC, tracepoints
> must be outside of noinstr, though maybe I'm misremembering that. And
> conversely, SVM doesn't trace exits that are handled in the fastpath. Best
> option is probably to move VMX's trace_kvm_exit() call to vmx_handle_exit(),
> and then figure out a common way to trace exits that are handled in the
> fastpath (which can reside outside of the noinstr section).
Powered by blists - more mailing lists