linux-kernel - Re: [PATCH v6 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f9ad21a-95a4-ddd0-950b-4ec18ddacd18@redhat.com>
Date:   Mon, 6 Nov 2017 13:17:14 +0100
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Wanpeng Li <kernellwp@...il.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Cc:     Radim Krčmář <rkrcmar@...hat.com>,
        Wanpeng Li <wanpeng.li@...mail.com>,
        Jim Mattson <jmattson@...gle.com>
Subject: Re: [PATCH v6 3/3] KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME
 failure

On 06/11/2017 01:54, Wanpeng Li wrote:
> From: Wanpeng Li <wanpeng.li@...mail.com>
> 
> Commit 4f350c6dbcb (kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure 
> properly) can result in L1(run kvm-unit-tests/run_tests.sh vmx_controls in L1) 
> null pointer deference and also L0 calltrace when EPT=0 on both L0 and L1.
> 
> In L1:
> 
> BUG: unable to handle kernel paging request at ffffffffc015bf8f
>  IP: vmx_vcpu_run+0x202/0x510 [kvm_intel]
>  PGD 146e13067 P4D 146e13067 PUD 146e15067 PMD 3d2686067 PTE 3d4af9161
>  Oops: 0003 [#1] PREEMPT SMP
>  CPU: 2 PID: 1798 Comm: qemu-system-x86 Not tainted 4.14.0-rc4+ #6
>  RIP: 0010:vmx_vcpu_run+0x202/0x510 [kvm_intel]
>  Call Trace:
>  WARNING: kernel stack frame pointer at ffffb86f4988bc18 in qemu-system-x86:1798 has bad value 0000000000000002
> 
> In L0:
> 
> -----------[ cut here ]------------
>  WARNING: CPU: 6 PID: 4460 at /home/kernel/linux/arch/x86/kvm//vmx.c:9845 vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
>  CPU: 6 PID: 4460 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc7+ #25
>  RIP: 0010:vmx_inject_page_fault_nested+0x130/0x140 [kvm_intel]
>  Call Trace:
>   paging64_page_fault+0x500/0xde0 [kvm]
>   ? paging32_gva_to_gpa_nested+0x120/0x120 [kvm]
>   ? nonpaging_page_fault+0x3b0/0x3b0 [kvm]
>   ? __asan_storeN+0x12/0x20
>   ? paging64_gva_to_gpa+0xb0/0x120 [kvm]
>   ? paging64_walk_addr_generic+0x11a0/0x11a0 [kvm]
>   ? lock_acquire+0x2c0/0x2c0
>   ? vmx_read_guest_seg_ar+0x97/0x100 [kvm_intel]
>   ? vmx_get_segment+0x2a6/0x310 [kvm_intel]
>   ? sched_clock+0x1f/0x30
>   ? check_chain_key+0x137/0x1e0
>   ? __lock_acquire+0x83c/0x2420
>   ? kvm_multiple_exception+0xf2/0x220 [kvm]
>   ? debug_check_no_locks_freed+0x240/0x240
>   ? debug_smp_processor_id+0x17/0x20
>   ? __lock_is_held+0x9e/0x100
>   kvm_mmu_page_fault+0x90/0x180 [kvm]
>   kvm_handle_page_fault+0x15c/0x310 [kvm]
>   ? __lock_is_held+0x9e/0x100
>   handle_exception+0x3c7/0x4d0 [kvm_intel]
>   vmx_handle_exit+0x103/0x1010 [kvm_intel]
>   ? kvm_arch_vcpu_ioctl_run+0x1628/0x2e20 [kvm]
> 
> The commit avoids to load host state of vmcs12 as vmcs01's guest state 
> since vmcs12 is not modified (except for the VM-instruction error field)
> if the checking of vmcs control area fails. However, the mmu context is 
> switched to nested mmu in prepare_vmcs02() and it will not be reloaded 
> since load_vmcs12_host_state() is skipped when nested VMLAUNCH/VMRESUME 
> fails. This patch fixes it by reloading mmu context when nested 
> VMLAUNCH/VMRESUME fails.
> 
> Reviewed-by: Jim Mattson <jmattson@...gle.com>
> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@...cle.com>
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Cc: Radim Krčmář <rkrcmar@...hat.com>
> Cc: Jim Mattson <jmattson@...gle.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@...mail.com>
> ---
> v3 -> v4:
>  * move it to a new function load_vmcs12_mmu_host_state
> 
>  arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++------------
>  1 file changed, 22 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 6cf3972..8aefb91 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11259,6 +11259,24 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>  	kvm_clear_interrupt_queue(vcpu);
>  }
>  
> +static void load_vmcs12_mmu_host_state(struct kvm_vcpu *vcpu,
> +			struct vmcs12 *vmcs12)
> +{
> +	u32 entry_failure_code;
> +
> +	nested_ept_uninit_mmu_context(vcpu);
> +
> +	/*
> +	 * Only PDPTE load can fail as the value of cr3 was checked on entry and
> +	 * couldn't have changed.
> +	 */
> +	if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &entry_failure_code))
> +		nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
> +
> +	if (!enable_ept)
> +		vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
> +}
> +
>  /*
>   * A part of what we need to when the nested L2 guest exits and we want to
>   * run its L1 parent, is to reset L1's guest state to the host state specified
> @@ -11272,7 +11290,6 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
>  				   struct vmcs12 *vmcs12)
>  {
>  	struct kvm_segment seg;
> -	u32 entry_failure_code;
>  
>  	if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
>  		vcpu->arch.efer = vmcs12->host_ia32_efer;
> @@ -11299,17 +11316,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
>  	vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK);
>  	vmx_set_cr4(vcpu, vmcs12->host_cr4);
>  
> -	nested_ept_uninit_mmu_context(vcpu);
> -
> -	/*
> -	 * Only PDPTE load can fail as the value of cr3 was checked on entry and
> -	 * couldn't have changed.
> -	 */
> -	if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &entry_failure_code))
> -		nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
> -
> -	if (!enable_ept)
> -		vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
> +	load_vmcs12_mmu_host_state(vcpu, vmcs12);
>  
>  	if (enable_vpid) {
>  		/*
> @@ -11539,6 +11546,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>  	 * accordingly.
>  	 */
>  	nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
> +
> +	load_vmcs12_mmu_host_state(vcpu, vmcs12);
> +
>  	/*
>  	 * The emulated instruction was already skipped in
>  	 * nested_vmx_run, but the updated RIP was never
> 

Reviewed-by: Paolo Bonzini <pbonzini@...hat.com>