linux-kernel - Re: general protection fault in vmx_vcpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1530732704.23804.8.camel@amazon.de>
Date:   Wed, 4 Jul 2018 19:31:44 +0000
From:   "Raslan, KarimAllah" <karahmed@...zon.de>
To:     "jmattson@...gle.com" <jmattson@...gle.com>,
        "dvyukov@...gle.com" <dvyukov@...gle.com>
CC:     "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "syzbot+cc483201a3c6436d3550@...kaller.appspotmail.com" 
        <syzbot+cc483201a3c6436d3550@...kaller.appspotmail.com>,
        "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
        "rkrcmar@...hat.com" <rkrcmar@...hat.com>
Subject: Re: general protection fault in vmx_vcpu_run

Dmitry,

Can you share the host kernel version?

I can not reproduce any of these crash signatures and I think it's 
really a nested virtualization bug. So I will need the exact host 
kernel version as well.

I am currently getting all sorts of:

"KVM: entry failed, hardware error 0x7"

... instead of the crash signatures that you are posting.

Regards.

On Sat, 2018-06-30 at 08:09 +0000, Raslan, KarimAllah wrote:
> Looking also at the other crash [0]:
> 
>         msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
> ffffffff811f65b7:       e8 44 cb 57 00          callq  ffffffff81773100
> <__sanitizer_cov_trace_pc>
> ffffffff811f65bc:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
> ffffffff811f65c1:       48 b8 00 00 00 00 00    movabs
> $0xdffffc0000000000,%rax
> ffffffff811f65c8:       fc ff df
> ffffffff811f65cb:       48 c1 ea 03             shr    $0x3,%rdx
> ffffffff811f65cf:       80 3c 02
> 00             cmpb   $0x0,(%rdx,%rax,1)        <- fault here.
> ffffffff811f65d3:       0f 85 36 19 00 00       jne    ffffffff811f7f0f
> <vmx_vcpu_run+0x236f>
> 
> %rdx should contain a pointer to loaded_vmcs. It is directly loaded 
> from the stack [0x8(%rsp)]. This same stack location was just used 
> before the inlined assembly for VMRESUME/VMLAUNCH here:
> 
>         vmx->__launched = vmx->loaded_vmcs->launched;
> ffffffff811f639f:       e8 5c cd 57 00          callq  ffffffff81773100
> <__sanitizer_cov_trace_pc>
> ffffffff811f63a4:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
> ffffffff811f63a9:       48 b8 00 00 00 00 00    movabs
> $0xdffffc0000000000,%rax
> ffffffff811f63b0:       fc ff df
> ffffffff811f63b3:       48 c1 ea 03             shr    $0x3,%rdx
> ffffffff811f63b7:       80 3c 02
> 00             cmpb   $0x0,(%rdx,%rax,1)        <- used here.
> 
> ... and this stack location was never touched by anything in between! 
> So something must have corrupted the stack itself not really the 
> kvm_vc
> pu struct.
> 
> Obviously the inlined assembly block is using the stack as well, but I 
> can not see anything that would cause this corruption there.
> 
> That being said, looking at the %rsp and %rbp values that are dumped
> in the stack trace:
> 
> RSP: ffff8801b7d7f380
> RBP: ffff8801b8260140
> 
> ... they are almost 4.8 MiB apart! Should not these two register be a 
> bit closer to each other? :)
> 
> So 2 possibilities here:
> 
> 1- %rsp is wrong
> 
> That would explain why the loaded_vmcs was NULL. However, it is a bit 
> harder to understand how it became wrong! It should have been restored 
> during the VMEXIT from the HOST_RSP value in the VMCS!
> 
> Is this a nested setup?
> 
> 2- %rbp is wrong
> 
> That would also explain why the loaded_vmcs was NULL. Whatever
> corrupted the stack that caused loaded_vmcs to be NULL could have also
> corrupted the %rbp saved in the stack. That would mean that it happened
> during a function call. All function calls that happened between the
> point when the stack was sane (just before the "asm" block for
> VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I
> can not see where the stack would get corrupted though! Obviously
> another source of corruption can be a completely unrelated thread
> directly corruption this thread's memory.
> 
> Maybe it would be easier to just try to repro it first and see which 
> one is true (if at all).
> 
> [0] https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550
> 
> 
> On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote:
> > 
> >   22: 0f 01 c3              vmresume
> >   25: 48 89 4c 24 08        mov    %rcx,0x8(%rsp)
> >   2a: 59                    pop    %rcx
> > 
> > <rip>:
> >   2b: 0f 96 81 88 56 00 00 setbe  0x5688(%rcx)
> >   32: 48 89 81 00 03 00 00 mov    %rax,0x300(%rcx)
> >   39: 48 89 99 18 03 00 00 mov    %rbx,0x318(%rcx)
> > 
> > %rcx should be pointing to the vcpu_vmx structure, but it's not even
> > canonical: 1ffff10035842e78.
> > 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B