lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACT4Y+a86Of9E2oUus=pQKJuhjVEjZ7sFGUsJ=Y31zpae8fE-Q@mail.gmail.com>
Date:   Thu, 5 Jul 2018 07:32:55 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     "Raslan, KarimAllah" <karahmed@...zon.de>
Cc:     "jmattson@...gle.com" <jmattson@...gle.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "syzbot+cc483201a3c6436d3550@...kaller.appspotmail.com" 
        <syzbot+cc483201a3c6436d3550@...kaller.appspotmail.com>,
        "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
        "rkrcmar@...hat.com" <rkrcmar@...hat.com>
Subject: Re: general protection fault in vmx_vcpu_run

On Wed, Jul 4, 2018 at 9:31 PM, Raslan, KarimAllah <karahmed@...zon.de> wrote:
> Dmitry,
>
> Can you share the host kernel version?
>
> I can not reproduce any of these crash signatures and I think it's
> really a nested virtualization bug. So I will need the exact host
> kernel version as well.
>
> I am currently getting all sorts of:
>
> "KVM: entry failed, hardware error 0x7"
>
> ... instead of the crash signatures that you are posting.


Hi Raslan,

The tested kernel runs as GCE VM.
Jim, how can we describe the host kernel for GCE? Potentially only we
can debug this.


> On Sat, 2018-06-30 at 08:09 +0000, Raslan, KarimAllah wrote:
>> Looking also at the other crash [0]:
>>
>>         msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
>> ffffffff811f65b7:       e8 44 cb 57 00          callq  ffffffff81773100
>> <__sanitizer_cov_trace_pc>
>> ffffffff811f65bc:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
>> ffffffff811f65c1:       48 b8 00 00 00 00 00    movabs
>> $0xdffffc0000000000,%rax
>> ffffffff811f65c8:       fc ff df
>> ffffffff811f65cb:       48 c1 ea 03             shr    $0x3,%rdx
>> ffffffff811f65cf:       80 3c 02
>> 00             cmpb   $0x0,(%rdx,%rax,1)        <- fault here.
>> ffffffff811f65d3:       0f 85 36 19 00 00       jne    ffffffff811f7f0f
>> <vmx_vcpu_run+0x236f>
>>
>> %rdx should contain a pointer to loaded_vmcs. It is directly loaded
>> from the stack [0x8(%rsp)]. This same stack location was just used
>> before the inlined assembly for VMRESUME/VMLAUNCH here:
>>
>>         vmx->__launched = vmx->loaded_vmcs->launched;
>> ffffffff811f639f:       e8 5c cd 57 00          callq  ffffffff81773100
>> <__sanitizer_cov_trace_pc>
>> ffffffff811f63a4:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
>> ffffffff811f63a9:       48 b8 00 00 00 00 00    movabs
>> $0xdffffc0000000000,%rax
>> ffffffff811f63b0:       fc ff df
>> ffffffff811f63b3:       48 c1 ea 03             shr    $0x3,%rdx
>> ffffffff811f63b7:       80 3c 02
>> 00             cmpb   $0x0,(%rdx,%rax,1)        <- used here.
>>
>> ... and this stack location was never touched by anything in between!
>> So something must have corrupted the stack itself not really the
>> kvm_vc
>> pu struct.
>>
>> Obviously the inlined assembly block is using the stack as well, but I
>> can not see anything that would cause this corruption there.
>>
>> That being said, looking at the %rsp and %rbp values that are dumped
>> in the stack trace:
>>
>> RSP: ffff8801b7d7f380
>> RBP: ffff8801b8260140
>>
>> ... they are almost 4.8 MiB apart! Should not these two register be a
>> bit closer to each other? :)
>>
>> So 2 possibilities here:
>>
>> 1- %rsp is wrong
>>
>> That would explain why the loaded_vmcs was NULL. However, it is a bit
>> harder to understand how it became wrong! It should have been restored
>> during the VMEXIT from the HOST_RSP value in the VMCS!
>>
>> Is this a nested setup?
>>
>> 2- %rbp is wrong
>>
>> That would also explain why the loaded_vmcs was NULL. Whatever
>> corrupted the stack that caused loaded_vmcs to be NULL could have also
>> corrupted the %rbp saved in the stack. That would mean that it happened
>> during a function call. All function calls that happened between the
>> point when the stack was sane (just before the "asm" block for
>> VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I
>> can not see where the stack would get corrupted though! Obviously
>> another source of corruption can be a completely unrelated thread
>> directly corruption this thread's memory.
>>
>> Maybe it would be easier to just try to repro it first and see which
>> one is true (if at all).
>>
>> [0] https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550
>>
>>
>> On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote:
>> >
>> >   22: 0f 01 c3              vmresume
>> >   25: 48 89 4c 24 08        mov    %rcx,0x8(%rsp)
>> >   2a: 59                    pop    %rcx
>> >
>> > <rip>:
>> >   2b: 0f 96 81 88 56 00 00 setbe  0x5688(%rcx)
>> >   32: 48 89 81 00 03 00 00 mov    %rax,0x300(%rcx)
>> >   39: 48 89 99 18 03 00 00 mov    %rbx,0x318(%rcx)
>> >
>> > %rcx should be pointing to the vcpu_vmx structure, but it's not even
>> > canonical: 1ffff10035842e78.
>> >
> Amazon Development Center Germany GmbH
> Berlin - Dresden - Aachen
> main office: Krausenstr. 38, 10117 Berlin
> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> Ust-ID: DE289237879
> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ