[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1530732704.23804.8.camel@amazon.de>
Date: Wed, 4 Jul 2018 19:31:44 +0000
From: "Raslan, KarimAllah" <karahmed@...zon.de>
To: "jmattson@...gle.com" <jmattson@...gle.com>,
"dvyukov@...gle.com" <dvyukov@...gle.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"syzbot+cc483201a3c6436d3550@...kaller.appspotmail.com"
<syzbot+cc483201a3c6436d3550@...kaller.appspotmail.com>,
"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"mingo@...hat.com" <mingo@...hat.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
"rkrcmar@...hat.com" <rkrcmar@...hat.com>
Subject: Re: general protection fault in vmx_vcpu_run
Dmitry,
Can you share the host kernel version?
I can not reproduce any of these crash signatures and I think it's
really a nested virtualization bug. So I will need the exact host
kernel version as well.
I am currently getting all sorts of:
"KVM: entry failed, hardware error 0x7"
... instead of the crash signatures that you are posting.
Regards.
On Sat, 2018-06-30 at 08:09 +0000, Raslan, KarimAllah wrote:
> Looking also at the other crash [0]:
>
> msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
> ffffffff811f65b7: e8 44 cb 57 00 callq ffffffff81773100
> <__sanitizer_cov_trace_pc>
> ffffffff811f65bc: 48 8b 54 24 08 mov 0x8(%rsp),%rdx
> ffffffff811f65c1: 48 b8 00 00 00 00 00 movabs
> $0xdffffc0000000000,%rax
> ffffffff811f65c8: fc ff df
> ffffffff811f65cb: 48 c1 ea 03 shr $0x3,%rdx
> ffffffff811f65cf: 80 3c 02
> 00 cmpb $0x0,(%rdx,%rax,1) <- fault here.
> ffffffff811f65d3: 0f 85 36 19 00 00 jne ffffffff811f7f0f
> <vmx_vcpu_run+0x236f>
>
> %rdx should contain a pointer to loaded_vmcs. It is directly loaded
> from the stack [0x8(%rsp)]. This same stack location was just used
> before the inlined assembly for VMRESUME/VMLAUNCH here:
>
> vmx->__launched = vmx->loaded_vmcs->launched;
> ffffffff811f639f: e8 5c cd 57 00 callq ffffffff81773100
> <__sanitizer_cov_trace_pc>
> ffffffff811f63a4: 48 8b 54 24 08 mov 0x8(%rsp),%rdx
> ffffffff811f63a9: 48 b8 00 00 00 00 00 movabs
> $0xdffffc0000000000,%rax
> ffffffff811f63b0: fc ff df
> ffffffff811f63b3: 48 c1 ea 03 shr $0x3,%rdx
> ffffffff811f63b7: 80 3c 02
> 00 cmpb $0x0,(%rdx,%rax,1) <- used here.
>
> ... and this stack location was never touched by anything in between!
> So something must have corrupted the stack itself not really the
> kvm_vc
> pu struct.
>
> Obviously the inlined assembly block is using the stack as well, but I
> can not see anything that would cause this corruption there.
>
> That being said, looking at the %rsp and %rbp values that are dumped
> in the stack trace:
>
> RSP: ffff8801b7d7f380
> RBP: ffff8801b8260140
>
> ... they are almost 4.8 MiB apart! Should not these two register be a
> bit closer to each other? :)
>
> So 2 possibilities here:
>
> 1- %rsp is wrong
>
> That would explain why the loaded_vmcs was NULL. However, it is a bit
> harder to understand how it became wrong! It should have been restored
> during the VMEXIT from the HOST_RSP value in the VMCS!
>
> Is this a nested setup?
>
> 2- %rbp is wrong
>
> That would also explain why the loaded_vmcs was NULL. Whatever
> corrupted the stack that caused loaded_vmcs to be NULL could have also
> corrupted the %rbp saved in the stack. That would mean that it happened
> during a function call. All function calls that happened between the
> point when the stack was sane (just before the "asm" block for
> VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I
> can not see where the stack would get corrupted though! Obviously
> another source of corruption can be a completely unrelated thread
> directly corruption this thread's memory.
>
> Maybe it would be easier to just try to repro it first and see which
> one is true (if at all).
>
> [0] https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550
>
>
> On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote:
> >
> > 22: 0f 01 c3 vmresume
> > 25: 48 89 4c 24 08 mov %rcx,0x8(%rsp)
> > 2a: 59 pop %rcx
> >
> > <rip>:
> > 2b: 0f 96 81 88 56 00 00 setbe 0x5688(%rcx)
> > 32: 48 89 81 00 03 00 00 mov %rax,0x300(%rcx)
> > 39: 48 89 99 18 03 00 00 mov %rbx,0x318(%rcx)
> >
> > %rcx should be pointing to the vcpu_vmx structure, but it's not even
> > canonical: 1ffff10035842e78.
> >
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Powered by blists - more mailing lists