lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 07 Jan 2021 11:41:01 +0200
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     kvm@...r.kernel.org, Joerg Roedel <joro@...tes.org>,
        Wanpeng Li <wanpengli@...cent.com>,
        "open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)" 
        <linux-kernel@...r.kernel.org>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Jim Mattson <jmattson@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 2/2] KVM: nVMX: fix for disappearing L1->L2 event
 injection on L1 migration

On Thu, 2021-01-07 at 04:38 +0200, Maxim Levitsky wrote:
> On Wed, 2021-01-06 at 10:17 -0800, Sean Christopherson wrote:
> > On Wed, Jan 06, 2021, Maxim Levitsky wrote:
> > > If migration happens while L2 entry with an injected event to L2 is pending,
> > > we weren't including the event in the migration state and it would be
> > > lost leading to L2 hang.
> > 
> > But the injected event should still be in vmcs12 and KVM_STATE_NESTED_RUN_PENDING
> > should be set in the migration state, i.e. it should naturally be copied to
> > vmcs02 and thus (re)injected by vmx_set_nested_state().  Is nested_run_pending
> > not set?  Is the info in vmcs12 somehow lost?  Or am I off in left field...
> 
> You are completely right. 
> The injected event can be copied like that since the vmc(b|s)12 is migrated.
> 
> We can safely disregard both these two patches and the parallel two patches for SVM.
> I am almost sure that the real root cause of this bug was that we 
> weren't restoring the nested run pending flag, and I even 
> happened to fix this in this patch series.
> 
> This is the trace of the bug (I removed the timestamps to make it easier to read)
> 
> 
> kvm_exit:             vcpu 0 reason vmrun rip 0xffffffffa0688ffa info1 0x0000000000000000 info2 0x0000000000000000 intr_info 0x00000000 error_code 0x00000000
> kvm_nested_vmrun:     rip: 0xffffffffa0688ffa vmcb: 0x0000000103594000 nrip: 0xffffffff814b3b01 int_ctl: 0x01000001 event_inj: 0x80000036 npt: on
> 																^^^ this is the injection
> kvm_nested_intercepts: cr_read: 0010 cr_write: 0010 excp: 00060042 intercepts: bc4c8027 00006e7f 00000000
> kvm_fpu:              unload
> kvm_userspace_exit:   reason KVM_EXIT_INTR (10)
> 
> ============================================================================
> migration happens here
> ============================================================================
> 
> ...
> kvm_async_pf_ready:   token 0xffffffff gva 0
> kvm_apic_accept_irq:  apicid 0 vec 243 (Fixed|edge)
> 
> kvm_nested_intr_vmexit: rip: 0x000000000000fff0
> 
> ^^^^^ this is the nested vmexit that shouldn't have happened, since nested run is pending,
> and which erased the eventinj field which was migrated correctly just like you say.
> 
> kvm_nested_vmexit_inject: reason: interrupt ext_inf1: 0x0000000000000000 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000
> ...
> 
> 
> We did notice that this vmexit had a wierd RIP and I 
> even explained this later to myself,
> that this is the default RIP which we put to vmcb, 
> and it wasn't yet updated, since it updates just prior to vm entry.
> 
> My test already survived about 170 iterations (usually it crashes after 20-40 iterations)
> I am leaving the stress test running all night, let see if it survives.

And after leaving it overnight, the test survived about 1000 iterations.

Thanks again!

Best regards,
	Maxim Levitstky


> 
> V2 of the patches is on the way.
> 
> Thanks again for the help!
> 
> Best regards,
> 	Maxim Levitsky
> 
> >  
> > > Fix this by queueing the injected event in similar manner to how we queue
> > > interrupted injections.
> > > 
> > > This can be reproduced by running an IO intense task in L2,
> > > and repeatedly migrating the L1.
> > > 
> > > Suggested-by: Paolo Bonzini <pbonzini@...hat.com>
> > > Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> > > ---
> > >  arch/x86/kvm/vmx/nested.c | 12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index e2f26564a12de..2ea0bb14f385f 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -2355,12 +2355,12 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> > >  	 * Interrupt/Exception Fields
> > >  	 */
> > >  	if (vmx->nested.nested_run_pending) {
> > > -		vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
> > > -			     vmcs12->vm_entry_intr_info_field);
> > > -		vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE,
> > > -			     vmcs12->vm_entry_exception_error_code);
> > > -		vmcs_write32(VM_ENTRY_INSTRUCTION_LEN,
> > > -			     vmcs12->vm_entry_instruction_len);
> > > +		if ((vmcs12->vm_entry_intr_info_field & VECTORING_INFO_VALID_MASK))
> > > +			vmx_process_injected_event(&vmx->vcpu,
> > > +						   vmcs12->vm_entry_intr_info_field,
> > > +						   vmcs12->vm_entry_instruction_len,
> > > +						   vmcs12->vm_entry_exception_error_code);
> > > +
> > >  		vmcs_write32(GUEST_INTERRUPTIBILITY_INFO,
> > >  			     vmcs12->guest_interruptibility_info);
> > >  		vmx->loaded_vmcs->nmi_known_unmasked =
> > > -- 
> > > 2.26.2
> > > 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ