[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YcunSb52LlGKT7dC@google.com>
Date: Wed, 29 Dec 2021 00:09:45 +0000
From: Sean Christopherson <seanjc@...gle.com>
To: Jing Liu <jing2.liu@...el.com>
Cc: x86@...nel.org, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kselftest@...r.kernel.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, pbonzini@...hat.com, corbet@....net,
shuah@...nel.org, jun.nakajima@...el.com, kevin.tian@...el.com,
jing2.liu@...ux.intel.com, guang.zeng@...el.com,
wei.w.wang@...el.com, yang.zhong@...el.com
Subject: Re: [PATCH v3 13/22] kvm: x86: Intercept #NM for saving IA32_XFD_ERR
On Wed, Dec 22, 2021, Jing Liu wrote:
> Guest IA32_XFD_ERR is generally modified in two places:
>
> - Set by CPU when #NM is triggered;
> - Cleared by guest in its #NM handler;
>
> Intercept #NM for the first case, if guest writes XFD as nonzero for
> the first time which indicates guest is possible to use XFD generating
> the exception. #NM is rare if the guest doesn't use dynamic features.
> Otherwise, there is at most one exception per guest task given a
> dynamic feature.
>
> Save the current XFD_ERR value to the guest_fpu container in the #NM
> VM-exit handler. This must be done with interrupt/preemption disabled,
Assuming my below understanding is correct, drop the "preemption" bit, it's
misleading.
> otherwise the unsaved MSR value may be clobbered by host operations.
>
> Inject a virtual #NM to the guest after saving the MSR value.
>
> Restore the host value (always ZERO outside of the host #NM
> handler) before enabling preemption.
AIUI, changelog is wrong, code is right. This must be done before _IRQs_ are
enabled, same as handling TIF_NEED_FPU_LOAD.
> Restore the guest value from the guest_fpu container right before
> entering the guest (with preemption disabled).
Same complaint about preemption.
> Suggested-by: Thomas Gleixner <tglx@...utronix.de>
> Signed-off-by: Jing Liu <jing2.liu@...el.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/vmx/vmcs.h | 5 +++++
> arch/x86/kvm/vmx/vmx.c | 22 +++++++++++++++++++++-
> arch/x86/kvm/x86.c | 6 ++++++
> 4 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 555f4de47ef2..f7a661f35d1a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -640,6 +640,7 @@ struct kvm_vcpu_arch {
> u64 smi_count;
> bool tpr_access_reporting;
> bool xsaves_enabled;
> + bool trap_nm;
> u64 ia32_xss;
> u64 microcode_version;
> u64 arch_capabilities;
...
> @@ -763,6 +764,9 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
> vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, match);
> }
>
> + if (vcpu->arch.trap_nm)
> + eb |= (1u << NM_VECTOR);
> +
> vmcs_write32(EXCEPTION_BITMAP, eb);
> }
>
> @@ -1960,6 +1964,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_KERNEL_GS_BASE:
> vmx_write_guest_kernel_gs_base(vmx, data);
> break;
> + case MSR_IA32_XFD:
> + ret = kvm_set_msr_common(vcpu, msr_info);
> + if (!ret && data) {
> + vcpu->arch.trap_nm = true;
> + vmx_update_exception_bitmap(vcpu);
This is wrong, it fails to clear vcpu->arch.trap_nm and update the bitmap if the
MSR is cleared.
But why even bother with an extra flag? Can't vmx_update_exception_bitmap() get
the guest's MSR_IA32_XFD value and intercept #NM accordingly? Then you could
even handle this fully in kvm_set_msr_common(), e.g.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c9606380bca..c6c936d2b298 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3704,6 +3704,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
fpu_update_guest_xfd(&vcpu->arch.guest_fpu, data);
+ /* Blah blah blah blah */
+ static_call(kvm_x86_update_exception_bitmap)(vcpu);
break;
case MSR_IA32_XFD_ERR:
if (!msr_info->host_initiated &&
> + }
> + break;
> #endif
> case MSR_IA32_SYSENTER_CS:
> if (is_guest_mode(vcpu))
> @@ -4746,7 +4757,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> vect_info = vmx->idt_vectoring_info;
> intr_info = vmx_get_intr_info(vcpu);
>
> - if (is_machine_check(intr_info) || is_nmi(intr_info))
> + if (is_machine_check(intr_info) || is_nmi(intr_info) || is_nm(intr_info))
> return 1; /* handled by handle_exception_nmi_irqoff() */
>
> if (is_invalid_opcode(intr_info))
> @@ -6350,6 +6361,12 @@ static void handle_interrupt_nmi_irqoff(struct kvm_vcpu *vcpu,
> kvm_after_interrupt(vcpu);
> }
>
> +static void handle_exception_nm(struct kvm_vcpu *vcpu)
This needs a different name, it's waaaay too close to the base handle_exception_nmi(),
which runs with IRQs _on_. And please add "_irqoff" at the end. Maybe handle_nm_fault_irqoff()?
> +{
> + rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
> + kvm_queue_exception(vcpu, NM_VECTOR);
> +}
> +
> static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
> {
> const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist;
> @@ -6358,6 +6375,9 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
> /* if exit due to PF check for async PF */
> if (is_page_fault(intr_info))
> vmx->vcpu.arch.apf.host_apf_flags = kvm_read_and_reset_apf_flags();
> + /* if exit due to NM, handle before preemptions are enabled */
> + else if (is_nm(intr_info))
Same naming complaint about this helper, it looks like an is_nmi() typo. is_nm_fault()?
> + handle_exception_nm(&vmx->vcpu);
> /* Handle machine checks before interrupts are enabled */
> else if (is_machine_check(intr_info))
> kvm_machine_check();
Powered by blists - more mailing lists