[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db41b66ffe19134ee2c742420b5637a2a531ffc3.camel@redhat.com>
Date: Fri, 06 Sep 2024 14:30:45 -0400
From: Maxim Levitsky <mlevitsk@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini
<pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
syzbot+988d9efcdf137bc05f66@...kaller.appspotmail.com, Zheyu Ma
<zheyuma97@...il.com>, Kishen Maloor <kishen.maloor@...el.com>
Subject: Re: [PATCH] KVM: x86: Forcibly leave nested if RSM to L2 hits
shutdown
On Fri, 2024-09-06 at 09:13 -0700, Sean Christopherson wrote:
> Leave nested mode before synthesizing shutdown (a.k.a. TRIPLE_FAULT) if
> RSM fails when resuming L2 (a.k.a. guest mode). Architecturally, shutdown
> on RSM occurs _before_ the transition back to guest mode on both Intel and
> AMD.
>
> On Intel, per the SDM pseudocode, SMRAM state is loaded before critical
> VMX state:
>
> restore state normally from SMRAM;
> ...
> CR4.VMXE := value stored internally;
> IF internal storage indicates that the logical processor had been in
> VMX operation (root or non-root)
> THEN
> enter VMX operation (root or non-root);
> restore VMX-critical state as defined in Section 32.14.1;
> ...
> restore current VMCS pointer;
> FI;
>
> AMD's APM is both less clearcut and more explicit. Because AMD CPUs save
> VMCB and guest state in SMRAM itself, given the lack of anything in the
> APM to indicate a shutdown in guest mode is possible, a straightforward
> reading of the clause on invalid state is that _what_ state is invalid is
> irrelevant, i.e. all roads lead to shutdown.
>
> An RSM causes a processor shutdown if an invalid-state condition is
> found in the SMRAM state-save area.
>
> This fixes a bug found by syzkaller where synthesizing shutdown for L2
> led to a nested VM-Exit (if L1 is intercepting shutdown), which in turn
> caused KVM to complain about trying to cancel a nested VM-Enter (see
> commit 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM
> entry which is a result of RSM").
>
> Note, Paolo pointed out that KVM shouldn't set nested_run_pending until
> after loading SMRAM state. But as above, that's only half the story, KVM
> shouldn't transition to guest mode either. Unfortunately, fixing that
> mess requires rewriting the nVMX and nSVM RSM flows to not piggyback
> their nested VM-Enter flows, as executing the nested VM-Enter flows after
> loading state from SMRAM would clobber much of said state.
>
> For now, add a FIXME to call out that transitioning to guest mode before
> loading state from SMRAM is wrong.
>
> Link: https://lore.kernel.org/all/CABgObfYaUHXyRmsmg8UjRomnpQ0Jnaog9-L2gMjsjkqChjDYUQ@mail.gmail.com
> Reported-by: syzbot+988d9efcdf137bc05f66@...kaller.appspotmail.com
> Closes: https://lore.kernel.org/all/0000000000007a9acb06151e1670@google.com
> Reported-by: Zheyu Ma <zheyuma97@...il.com>
> Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@mail.gmail.com
> Analyzed-by: Michal Wilczynski <michal.wilczynski@...el.com>
> Cc: Kishen Maloor <kishen.maloor@...el.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
> arch/x86/kvm/smm.c | 24 +++++++++++++++++++-----
> arch/x86/kvm/x86.c | 6 ------
> arch/x86/kvm/x86.h | 6 ++++++
> 3 files changed, 25 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
> index 00e3c27d2a87..85241c0c7f56 100644
> --- a/arch/x86/kvm/smm.c
> +++ b/arch/x86/kvm/smm.c
> @@ -624,17 +624,31 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
> #endif
>
> /*
> - * Give leave_smm() a chance to make ISA-specific changes to the vCPU
> - * state (e.g. enter guest mode) before loading state from the SMM
> - * state-save area.
> + * FIXME: When resuming L2 (a.k.a. guest mode), the transition to guest
> + * mode should happen _after_ loading state from SMRAM. However, KVM
> + * piggybacks the nested VM-Enter flows (which is wrong for many other
> + * reasons), and so nSVM/nVMX would clobber state that is loaded from
> + * SMRAM and from the VMCS/VMCB.
> */
> if (kvm_x86_call(leave_smm)(vcpu, &smram))
> return X86EMUL_UNHANDLEABLE;
>
> #ifdef CONFIG_X86_64
> if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> - return rsm_load_state_64(ctxt, &smram.smram64);
> + ret = rsm_load_state_64(ctxt, &smram.smram64);
> else
> #endif
> - return rsm_load_state_32(ctxt, &smram.smram32);
> + ret = rsm_load_state_32(ctxt, &smram.smram32);
> +
> + /*
> + * If RSM fails and triggers shutdown, architecturally the shutdown
> + * occurs *before* the transition to guest mode. But due to KVM's
> + * flawed handling of RSM to L2 (see above), the vCPU may already be
> + * in_guest_mode(). Force the vCPU out of guest mode before delivering
> + * the shutdown, so that L1 enters shutdown instead of seeing a VM-Exit
> + * that architecturally shouldn't be possible.
> + */
> + if (ret != X86EMUL_CONTINUE && is_guest_mode(vcpu))
> + kvm_leave_nested(vcpu);
> + return ret;
> }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7aecf5b4c148..d00fd0d611bb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -833,12 +833,6 @@ static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vecto
> ex->payload = payload;
> }
>
> -/* Forcibly leave the nested mode in cases like a vCPU reset */
> -static void kvm_leave_nested(struct kvm_vcpu *vcpu)
> -{
> - kvm_x86_ops.nested_ops->leave_nested(vcpu);
> -}
> -
> static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
> unsigned nr, bool has_error, u32 error_code,
> bool has_payload, unsigned long payload, bool reinject)
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index bab6f9c4a790..a84c48ef5278 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -109,6 +109,12 @@ static inline unsigned int __shrink_ple_window(unsigned int val,
> void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
> int kvm_check_nested_events(struct kvm_vcpu *vcpu);
>
> +/* Forcibly leave the nested mode in cases like a vCPU reset */
> +static inline void kvm_leave_nested(struct kvm_vcpu *vcpu)
> +{
> + kvm_x86_ops.nested_ops->leave_nested(vcpu);
> +}
> +
> static inline bool kvm_vcpu_has_run(struct kvm_vcpu *vcpu)
> {
> return vcpu->arch.last_vmentry_cpu != -1;
>
> base-commit: 12680d7b8ac4db2eba6237a21a93d2b0e78a52a6
Reviewed-by: Maxim Levitsky <mlevitsk@...hat.com>
Best regards,
Maxim Levitsky
Powered by blists - more mailing lists