[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230801083553.8468-7-xin3.li@intel.com>
Date: Tue, 1 Aug 2023 01:35:50 -0700
From: Xin Li <xin3.li@...el.com>
To: linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-edac@...r.kernel.org, linux-hyperv@...r.kernel.org,
kvm@...r.kernel.org, xen-devel@...ts.xenproject.org
Cc: Jonathan Corbet <corbet@....net>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H . Peter Anvin" <hpa@...or.com>,
Andy Lutomirski <luto@...nel.org>,
Oleg Nesterov <oleg@...hat.com>,
Tony Luck <tony.luck@...el.com>,
"K . Y . Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Juergen Gross <jgross@...e.com>,
Stefano Stabellini <sstabellini@...nel.org>,
Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
"Paul E . McKenney" <paulmck@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Randy Dunlap <rdunlap@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Kim Phillips <kim.phillips@....com>,
Xin Li <xin3.li@...el.com>,
Hyeonggon Yoo <42.hyeyoo@...il.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Sebastian Reichel <sebastian.reichel@...labora.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Suren Baghdasaryan <surenb@...gle.com>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Babu Moger <babu.moger@....com>,
Jim Mattson <jmattson@...gle.com>,
Sandipan Das <sandipan.das@....com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Hans de Goede <hdegoede@...hat.com>,
Reinette Chatre <reinette.chatre@...el.com>,
Daniel Sneddon <daniel.sneddon@...ux.intel.com>,
Breno Leitao <leitao@...ian.org>,
Nikunj A Dadhania <nikunj@....com>,
Brian Gerst <brgerst@...il.com>,
Sami Tolvanen <samitolvanen@...gle.com>,
Alexander Potapenko <glider@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Arnd Bergmann <arnd@...db.de>,
"Eric W . Biederman" <ebiederm@...ssion.com>,
Kees Cook <keescook@...omium.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Masahiro Yamada <masahiroy@...nel.org>,
Ze Gao <zegao2021@...il.com>, Fei Li <fei1.li@...el.com>,
Conghui <conghui.chen@...el.com>,
Ashok Raj <ashok.raj@...el.com>,
"Jason A . Donenfeld" <Jason@...c4.com>,
Mark Rutland <mark.rutland@....com>,
Jacob Pan <jacob.jun.pan@...ux.intel.com>,
Jiapeng Chong <jiapeng.chong@...ux.alibaba.com>,
Jane Malalane <jane.malalane@...rix.com>,
David Woodhouse <dwmw@...zon.co.uk>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Yantengsi <siyanteng@...ngson.cn>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Sathvika Vasireddy <sv@...ux.ibm.com>
Subject: [PATCH RESEND v9 33/36] KVM: VMX: Add VMX_DO_FRED_EVENT_IRQOFF for IRQ/NMI handling
Compared to an IDT stack frame, a FRED stack frame has extra 16 bytes of
information pushed at the regular stack top and 8 bytes of error code _always_
pushed at the regular stack bottom, add VMX_DO_FRED_EVENT_IRQOFF to generate
FRED stack frames with event type and vector properly set. Thus, IRQ/NMI can
be handled with the existing approach when FRED is enabled.
For IRQ handling, general purpose registers are pushed to the stack to form
a pt_regs structure, which is then used to call external_interrupt(). As a
result, IRQ handling no longer re-enters the noinstr code.
Tested-by: Shan Kang <shan.kang@...el.com>
Signed-off-by: Xin Li <xin3.li@...el.com>
---
Changes since v8:
* Add a new macro VMX_DO_FRED_EVENT_IRQOFF for FRED instead of refactoring
VMX_DO_EVENT_IRQOFF (Sean Christopherson).
* Do NOT use a trampoline, just LEA+PUSH the return RIP, PUSH the error code,
and jump to the FRED kernel entry point for NMI or call external_interrupt()
for IRQs (Sean Christopherson).
* Call external_interrupt() only when FRED is enabled, and convert the non-FRED
handling to external_interrupt() after FRED lands (Sean Christopherson).
---
arch/x86/kvm/vmx/vmenter.S | 88 ++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmx.c | 19 ++++++--
2 files changed, 104 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 07e927d4d099..5ee6a57b59a5 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -2,12 +2,14 @@
#include <linux/linkage.h>
#include <asm/asm.h>
#include <asm/bitsperlong.h>
+#include <asm/fred.h>
#include <asm/kvm_vcpu_regs.h>
#include <asm/nospec-branch.h>
#include <asm/percpu.h>
#include <asm/segment.h>
#include "kvm-asm-offsets.h"
#include "run_flags.h"
+#include "../../entry/calling.h"
#define WORD_SIZE (BITS_PER_LONG / 8)
@@ -31,6 +33,80 @@
#define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE
#endif
+#ifdef CONFIG_X86_FRED
+.macro VMX_DO_FRED_EVENT_IRQOFF branch_insn branch_target nmi=0
+ /*
+ * Unconditionally create a stack frame, getting the correct RSP on the
+ * stack (for x86-64) would take two instructions anyways, and RBP can
+ * be used to restore RSP to make objtool happy (see below).
+ */
+ push %_ASM_BP
+ mov %_ASM_SP, %_ASM_BP
+
+ /*
+ * Don't check the FRED stack level, the call stack leading to this
+ * helper is effectively constant and shallow (relatively speaking).
+ *
+ * Emulate the FRED-defined redzone and stack alignment.
+ */
+ sub $(FRED_CONFIG_REDZONE_AMOUNT << 6), %rsp
+ and $FRED_STACK_FRAME_RSP_MASK, %rsp
+
+ /*
+ * A FRED stack frame has extra 16 bytes of information pushed at the
+ * regular stack top compared to an IDT stack frame.
+ */
+ push $0 /* Reserved by FRED, must be 0 */
+ push $0 /* FRED event data, 0 for NMI and external interrupts */
+
+ shl $32, %rdi /* FRED event type and vector */
+ .if \nmi
+ bts $FRED_SSX_NMI_BIT, %rdi /* Set the NMI bit */
+ .endif
+ bts $FRED_SSX_64_BIT_MODE_BIT, %rdi /* Set the 64-bit mode */
+ or $__KERNEL_DS, %rdi
+ push %rdi
+ push %rbp
+ pushf
+ mov $__KERNEL_CS, %rax
+ push %rax
+
+ /*
+ * Unlike the IDT event delivery, FRED _always_ pushes an error code
+ * after pushing the return RIP, thus the CALL instruction CANNOT be
+ * used here to push the return RIP, otherwise there is no chance to
+ * push an error code before invoking the IRQ/NMI handler.
+ *
+ * Use LEA to get the return RIP and push it, then push an error code.
+ */
+ lea 1f(%rip), %rax
+ push %rax
+ push $0 /* FRED error code, 0 for NMI and external interrupts */
+
+ .if \nmi == 0
+ PUSH_REGS
+ mov %rsp, %rdi
+ .endif
+
+ \branch_insn \branch_target
+
+ .if \nmi == 0
+ POP_REGS
+ .endif
+
+1:
+ /*
+ * "Restore" RSP from RBP, even though IRET has already unwound RSP to
+ * the correct value. objtool doesn't know the callee will IRET and,
+ * without the explicit restore, thinks the stack is getting walloped.
+ * Using an unwind hint is problematic due to x86-64's dynamic alignment.
+ */
+ mov %_ASM_BP, %_ASM_SP
+ pop %_ASM_BP
+ RET
+.endm
+#endif
+
.macro VMX_DO_EVENT_IRQOFF call_insn call_target
/*
* Unconditionally create a stack frame, getting the correct RSP on the
@@ -299,6 +375,12 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, SYM_L_GLOBAL)
SYM_FUNC_END(__vmx_vcpu_run)
+#ifdef CONFIG_X86_FRED
+SYM_FUNC_START(vmx_do_fred_nmi_irqoff)
+ VMX_DO_FRED_EVENT_IRQOFF jmp fred_entrypoint_kernel nmi=1
+SYM_FUNC_END(vmx_do_fred_nmi_irqoff)
+#endif
+
SYM_FUNC_START(vmx_do_nmi_irqoff)
VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx
SYM_FUNC_END(vmx_do_nmi_irqoff)
@@ -357,6 +439,12 @@ SYM_FUNC_START(vmread_error_trampoline)
SYM_FUNC_END(vmread_error_trampoline)
#endif
+#ifdef CONFIG_X86_FRED
+SYM_FUNC_START(vmx_do_fred_interrupt_irqoff)
+ VMX_DO_FRED_EVENT_IRQOFF call external_interrupt
+SYM_FUNC_END(vmx_do_fred_interrupt_irqoff)
+#endif
+
SYM_FUNC_START(vmx_do_interrupt_irqoff)
VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1
SYM_FUNC_END(vmx_do_interrupt_irqoff)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0ecf4be2c6af..4e90c69a92bf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6890,6 +6890,14 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
}
+#ifdef CONFIG_X86_FRED
+void vmx_do_fred_interrupt_irqoff(unsigned int vector);
+void vmx_do_fred_nmi_irqoff(unsigned int vector);
+#else
+#define vmx_do_fred_interrupt_irqoff(x) BUG()
+#define vmx_do_fred_nmi_irqoff(x) BUG()
+#endif
+
void vmx_do_interrupt_irqoff(unsigned long entry);
void vmx_do_nmi_irqoff(void);
@@ -6932,14 +6940,16 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
{
u32 intr_info = vmx_get_intr_info(vcpu);
unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
- gate_desc *desc = (gate_desc *)host_idt_base + vector;
if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
"unexpected VM-Exit interrupt info: 0x%x", intr_info))
return;
kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
- vmx_do_interrupt_irqoff(gate_offset(desc));
+ if (cpu_feature_enabled(X86_FEATURE_FRED))
+ vmx_do_fred_interrupt_irqoff(vector); /* Event type is 0 */
+ else
+ vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector));
kvm_after_interrupt(vcpu);
vcpu->arch.at_instruction_boundary = true;
@@ -7225,7 +7235,10 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
is_nmi(vmx_get_intr_info(vcpu))) {
kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
- vmx_do_nmi_irqoff();
+ if (cpu_feature_enabled(X86_FEATURE_FRED))
+ vmx_do_fred_nmi_irqoff((EVENT_TYPE_NMI << 16) | NMI_VECTOR);
+ else
+ vmx_do_nmi_irqoff();
kvm_after_interrupt(vcpu);
}
--
2.34.1
Powered by blists - more mailing lists