lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <db41b66ffe19134ee2c742420b5637a2a531ffc3.camel@redhat.com>
Date: Fri, 06 Sep 2024 14:30:45 -0400
From: Maxim Levitsky <mlevitsk@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini
 <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	syzbot+988d9efcdf137bc05f66@...kaller.appspotmail.com, Zheyu Ma
	 <zheyuma97@...il.com>, Kishen Maloor <kishen.maloor@...el.com>
Subject: Re: [PATCH] KVM: x86: Forcibly leave nested if RSM to L2 hits
 shutdown

On Fri, 2024-09-06 at 09:13 -0700, Sean Christopherson wrote:
> Leave nested mode before synthesizing shutdown (a.k.a. TRIPLE_FAULT) if
> RSM fails when resuming L2 (a.k.a. guest mode).  Architecturally, shutdown
> on RSM occurs _before_ the transition back to guest mode on both Intel and
> AMD.
> 
> On Intel, per the SDM pseudocode, SMRAM state is loaded before critical
> VMX state:
> 
>   restore state normally from SMRAM;
>   ...
>   CR4.VMXE := value stored internally;
>   IF internal storage indicates that the logical processor had been in
>      VMX operation (root or non-root)
>   THEN
>      enter VMX operation (root or non-root);
>      restore VMX-critical state as defined in Section 32.14.1;
>      ...
>      restore current VMCS pointer;
>   FI;
> 
> AMD's APM is both less clearcut and more explicit.  Because AMD CPUs save
> VMCB and guest state in SMRAM itself, given the lack of anything in the
> APM to indicate a shutdown in guest mode is possible, a straightforward
> reading of the clause on invalid state is that _what_ state is invalid is
> irrelevant, i.e. all roads lead to shutdown.
> 
>   An RSM causes a processor shutdown if an invalid-state condition is
>   found in the SMRAM state-save area.
> 
> This fixes a bug found by syzkaller where synthesizing shutdown for L2
> led to a nested VM-Exit (if L1 is intercepting shutdown), which in turn
> caused KVM to complain about trying to cancel a nested VM-Enter (see
> commit 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM
> entry which is a result of RSM").
> 
> Note, Paolo pointed out that KVM shouldn't set nested_run_pending until
> after loading SMRAM state.  But as above, that's only half the story, KVM
> shouldn't transition to guest mode either.  Unfortunately, fixing that
> mess requires rewriting the nVMX and nSVM RSM flows to not piggyback
> their nested VM-Enter flows, as executing the nested VM-Enter flows after
> loading state from SMRAM would clobber much of said state.
> 
> For now, add a FIXME to call out that transitioning to guest mode before
> loading state from SMRAM is wrong.
> 
> Link: https://lore.kernel.org/all/CABgObfYaUHXyRmsmg8UjRomnpQ0Jnaog9-L2gMjsjkqChjDYUQ@mail.gmail.com
> Reported-by: syzbot+988d9efcdf137bc05f66@...kaller.appspotmail.com
> Closes: https://lore.kernel.org/all/0000000000007a9acb06151e1670@google.com
> Reported-by: Zheyu Ma <zheyuma97@...il.com>
> Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@mail.gmail.com
> Analyzed-by: Michal Wilczynski <michal.wilczynski@...el.com>
> Cc: Kishen Maloor <kishen.maloor@...el.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/kvm/smm.c | 24 +++++++++++++++++++-----
>  arch/x86/kvm/x86.c |  6 ------
>  arch/x86/kvm/x86.h |  6 ++++++
>  3 files changed, 25 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
> index 00e3c27d2a87..85241c0c7f56 100644
> --- a/arch/x86/kvm/smm.c
> +++ b/arch/x86/kvm/smm.c
> @@ -624,17 +624,31 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
>  #endif
>  
>  	/*
> -	 * Give leave_smm() a chance to make ISA-specific changes to the vCPU
> -	 * state (e.g. enter guest mode) before loading state from the SMM
> -	 * state-save area.
> +	 * FIXME: When resuming L2 (a.k.a. guest mode), the transition to guest
> +	 * mode should happen _after_ loading state from SMRAM.  However, KVM
> +	 * piggybacks the nested VM-Enter flows (which is wrong for many other
> +	 * reasons), and so nSVM/nVMX would clobber state that is loaded from
> +	 * SMRAM and from the VMCS/VMCB.
>  	 */
>  	if (kvm_x86_call(leave_smm)(vcpu, &smram))
>  		return X86EMUL_UNHANDLEABLE;
>  
>  #ifdef CONFIG_X86_64
>  	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> -		return rsm_load_state_64(ctxt, &smram.smram64);
> +		ret = rsm_load_state_64(ctxt, &smram.smram64);
>  	else
>  #endif
> -		return rsm_load_state_32(ctxt, &smram.smram32);
> +		ret = rsm_load_state_32(ctxt, &smram.smram32);
> +
> +	/*
> +	 * If RSM fails and triggers shutdown, architecturally the shutdown
> +	 * occurs *before* the transition to guest mode.  But due to KVM's
> +	 * flawed handling of RSM to L2 (see above), the vCPU may already be
> +	 * in_guest_mode().  Force the vCPU out of guest mode before delivering
> +	 * the shutdown, so that L1 enters shutdown instead of seeing a VM-Exit
> +	 * that architecturally shouldn't be possible.
> +	 */
> +	if (ret != X86EMUL_CONTINUE && is_guest_mode(vcpu))
> +		kvm_leave_nested(vcpu);
> +	return ret;
>  }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7aecf5b4c148..d00fd0d611bb 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -833,12 +833,6 @@ static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vecto
>  	ex->payload = payload;
>  }
>  
> -/* Forcibly leave the nested mode in cases like a vCPU reset */
> -static void kvm_leave_nested(struct kvm_vcpu *vcpu)
> -{
> -	kvm_x86_ops.nested_ops->leave_nested(vcpu);
> -}
> -
>  static void kvm_multiple_exception(struct kvm_vcpu *vcpu,
>  		unsigned nr, bool has_error, u32 error_code,
>  	        bool has_payload, unsigned long payload, bool reinject)
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index bab6f9c4a790..a84c48ef5278 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -109,6 +109,12 @@ static inline unsigned int __shrink_ple_window(unsigned int val,
>  void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu);
>  int kvm_check_nested_events(struct kvm_vcpu *vcpu);
>  
> +/* Forcibly leave the nested mode in cases like a vCPU reset */
> +static inline void kvm_leave_nested(struct kvm_vcpu *vcpu)
> +{
> +	kvm_x86_ops.nested_ops->leave_nested(vcpu);
> +}
> +
>  static inline bool kvm_vcpu_has_run(struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.last_vmentry_cpu != -1;
> 
> base-commit: 12680d7b8ac4db2eba6237a21a93d2b0e78a52a6

Reviewed-by: Maxim Levitsky <mlevitsk@...hat.com>

Best regards,
	Maxim Levitsky





Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ