linux-kernel - Re: [PATCH 2/2] KVM: nVMX: fix for disappearing L1->L2 event injection on L1 migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4e9db353de15333e17e023c91e2e0b4ec3d880c7.camel@redhat.com>
Date:   Thu, 07 Jan 2021 04:38:11 +0200
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     kvm@...r.kernel.org, Joerg Roedel <joro@...tes.org>,
        Wanpeng Li <wanpengli@...cent.com>,
        "open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" 
        <linux-kernel@...r.kernel.org>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Jim Mattson <jmattson@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 2/2] KVM: nVMX: fix for disappearing L1->L2 event
 injection on L1 migration

On Wed, 2021-01-06 at 10:17 -0800, Sean Christopherson wrote:
> On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> > If migration happens while L2 entry with an injected event to L2 is pending,
> > we weren't including the event in the migration state and it would be
> > lost leading to L2 hang.
> 
> But the injected event should still be in vmcs12 and KVM_STATE_NESTED_RUN_PENDING
> should be set in the migration state, i.e. it should naturally be copied to
> vmcs02 and thus (re)injected by vmx_set_nested_state().  Is nested_run_pending
> not set?  Is the info in vmcs12 somehow lost?  Or am I off in left field...


You are completely right. 
The injected event can be copied like that since the vmc(b|s)12 is migrated.

We can safely disregard both these two patches and the parallel two patches for SVM.
I am almost sure that the real root cause of this bug was that we 
weren't restoring the nested run pending flag, and I even 
happened to fix this in this patch series.

This is the trace of the bug (I removed the timestamps to make it easier to read)


kvm_exit:             vcpu 0 reason vmrun rip 0xffffffffa0688ffa info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
kvm_nested_vmrun:     rip: 0xffffffffa0688ffa vmcb: 0x0000000103594000 nrip: 0xffffffff814b3b01 int_ctl: 0x01000001 event_inj: 0x80000036 npt: on
																^^^ this is the injection
kvm_nested_intercepts: cr_read: 0010 cr_write: 0010 excp: 00060042 intercepts: bc4c8027 00006e7f 00000000
kvm_fpu:              unload
kvm_userspace_exit:   reason KVM_EXIT_INTR (10)

============================================================================
migration happens here
============================================================================

...
kvm_async_pf_ready:   token 0xffffffff gva 0
kvm_apic_accept_irq:  apicid 0 vec 243 (Fixed|edge)

kvm_nested_intr_vmexit: rip: 0x000000000000fff0

^^^^^ this is the nested vmexit that shouldn't have happened, since nested run is pending,
and which erased the eventinj field which was migrated correctly just like you say.

kvm_nested_vmexit_inject: reason: interrupt ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
...


We did notice that this vmexit had a wierd RIP and I 
even explained this later to myself,
that this is the default RIP which we put to vmcb, 
and it wasn't yet updated, since it updates just prior to vm entry.

My test already survived about 170 iterations (usually it crashes after 20-40 iterations)
I am leaving the stress test running all night, let see if it survives.

V2 of the patches is on the way.

Thanks again for the help!

Best regards,
	Maxim Levitsky

>  
> > Fix this by queueing the injected event in similar manner to how we queue
> > interrupted injections.
> > 
> > This can be reproduced by running an IO intense task in L2,
> > and repeatedly migrating the L1.
> > 
> > Suggested-by: Paolo Bonzini <pbonzini@...hat.com>
> > Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> > ---
> >  arch/x86/kvm/vmx/nested.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index e2f26564a12de..2ea0bb14f385f 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -2355,12 +2355,12 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> >  	 * Interrupt/Exception Fields
> >  	 */
> >  	if (vmx->nested.nested_run_pending) {
> > -		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> > -			     vmcs12->vm_entry_intr_info_field);
> > -		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
> > -			     vmcs12->vm_entry_exception_error_code);
> > -		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
> > -			     vmcs12->vm_entry_instruction_len);
> > +		if ((vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK))
> > +			vmx_process_injected_event(&vmx->vcpu,
> > +						   vmcs12->vm_entry_intr_info_field,
> > +						   vmcs12->vm_entry_instruction_len,
> > +						   vmcs12->vm_entry_exception_error_code);
> > +
> >  		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
> >  			     vmcs12->guest_interruptibility_info);
> >  		vmx->loaded_vmcs->nmi_known_unmasked =
> > -- 
> > 2.26.2
> >