linux-kernel - RE: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <DM5PR1101MB2172D7D7BC49255DB3752802A8069@DM5PR1101MB2172.namprd11.prod.outlook.com>
Date:   Thu, 17 Nov 2022 03:37:40 +0000
From:   "Li, Xin3" <xin3.li@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     Paolo Bonzini <pbonzini@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>,
        "Tian, Kevin" <kevin.tian@...el.com>
Subject: RE: [RESEND PATCH 5/6] KVM: x86/VMX: add kvm_vmx_reinject_nmi_irq()
 for NMI/IRQ reinjection


> > > > > But what about NMIs, afaict this is all horribly broken for NMIs.
> > > > >
> > > > > So the whole VMX thing latches the NMI (which stops NMI
> > > > > recursion),
> > > right?
> > > > >
> > > > > But then you drop out of noinstr code, which means any random
> > > > > exception can happen (kprobes #BP, hw_breakpoint #DB, or even
> > > > > #PF due to random nonsense like *SAN). This exception will do
> > > > > IRET and clear the NMI latch, all before you get to run any of the NMI
> code.
> > > >
> > > > What you said here implies that we have this problem in the existing
> code.
> > > > Because a fake iret stack is created to call the NMI handler in
> > > > the IDT NMI descriptor, which lastly executes the IRET instruction.
> > >
> > > I can't follow; of course the IDT handler terminates with IRET, it has to no?
> >
> > With FRED, ERETS/ERETU replace IRET, and use bit 28 of the popped CS
> > field to control whether to unblock NMI. If bit 28 of the field (above
> > the selector) is 1, ERETS/ERETU unblocks NMIs.
> 
> Yes, I know that. It is one of the many improvements FRED brings.
> Ideally the IBT WAIT-FOR-ENDBR state also gets squirreled away in the
> hardware exception frame, but that's still up in the air I believe :/
> 
> Anyway.. given there is interrupt priority and NMI is pretty much on top of
> everything else the reinject crap *should* run NMI first. That way NMI runs
> with the latch disabled and whatever other pending interrupts will run later.
> 
> But that all is still broken because afaict the current code also leaves noinstr --
> and once you leave noinstr (or use a static_key, static_call or anything else that
> *can* change at runtime) you can't guarantee nothing.

For NMI, HPA asked me to use "int $2", as it switches to the NMI IST stack to
execute the NMI handler, essentially like how HW deals with a NMI in host. and
I tested it with NMI watchdog, it looks working fine.

For IRQs, we still use the dispatch table, but with a new func added in
DEFINE_IDTENTRY_SYSVEC with the noinstr entry/exit code removed:

#define DEFINE_IDTENTRY_SYSVEC(func)                                    \
static void __##func(struct pt_regs *regs);                             \
                                                                        \
__visible noinstr void func(struct pt_regs *regs)                       \
{                                                                       \
        irqentry_state_t state = irqentry_enter(regs);                  \
                                                                        \
        instrumentation_begin();                                        \
        kvm_set_cpu_l1tf_flush_l1d();                                   \
        run_sysvec_on_irqstack_cond(__##func, regs);                    \
        instrumentation_end();                                          \
        irqentry_exit(regs, state);                                     \
}                                                                       \
                                                                        \
+void dispatch_table_##func(struct pt_regs *regs)                        \
+{                                                                       \
+        kvm_set_cpu_l1tf_flush_l1d();                                   \
+        run_sysvec_on_irqstack_cond(__##func, regs);                    \
+}
+                                                                       \
                                                                        \
static noinline void __##func(struct pt_regs *regs)

How do you think?