linux-kernel - Re: general protection fault in vmx_vcpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1530346163.13559.75.camel@amazon.de>
Date:   Sat, 30 Jun 2018 08:09:23 +0000
From:   "Raslan, KarimAllah" <karahmed@...zon.de>
To:     "jmattson@...gle.com" <jmattson@...gle.com>,
        "dvyukov@...gle.com" <dvyukov@...gle.com>
CC:     "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "syzbot+cc483201a3c6436d3550@...kaller.appspotmail.com" 
        <syzbot+cc483201a3c6436d3550@...kaller.appspotmail.com>,
        "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
        "rkrcmar@...hat.com" <rkrcmar@...hat.com>
Subject: Re: general protection fault in vmx_vcpu_run

Looking also at the other crash [0]:

        msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
ffffffff811f65b7:       e8 44 cb 57 00          callq  ffffffff81773100
<__sanitizer_cov_trace_pc>
ffffffff811f65bc:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
ffffffff811f65c1:       48 b8 00 00 00 00 00    movabs
$0xdffffc0000000000,%rax
ffffffff811f65c8:       fc ff df
ffffffff811f65cb:       48 c1 ea 03             shr    $0x3,%rdx
ffffffff811f65cf:       80 3c 02
00             cmpb   $0x0,(%rdx,%rax,1)        <- fault here.
ffffffff811f65d3:       0f 85 36 19 00 00       jne    ffffffff811f7f0f
<vmx_vcpu_run+0x236f>

%rdx should contain a pointer to loaded_vmcs. It is directly loaded 
from the stack [0x8(%rsp)]. This same stack location was just used 
before the inlined assembly for VMRESUME/VMLAUNCH here:

        vmx->__launched = vmx->loaded_vmcs->launched;
ffffffff811f639f:       e8 5c cd 57 00          callq  ffffffff81773100
<__sanitizer_cov_trace_pc>
ffffffff811f63a4:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
ffffffff811f63a9:       48 b8 00 00 00 00 00    movabs
$0xdffffc0000000000,%rax
ffffffff811f63b0:       fc ff df
ffffffff811f63b3:       48 c1 ea 03             shr    $0x3,%rdx
ffffffff811f63b7:       80 3c 02
00             cmpb   $0x0,(%rdx,%rax,1)        <- used here.

... and this stack location was never touched by anything in between! 
So something must have corrupted the stack itself not really the 
kvm_vc
pu struct.

Obviously the inlined assembly block is using the stack as well, but I 
can not see anything that would cause this corruption there.

That being said, looking at the %rsp and %rbp values that are dumped
in the stack trace:

RSP: ffff8801b7d7f380
RBP: ffff8801b8260140

... they are almost 4.8 MiB apart! Should not these two register be a 
bit closer to each other? :)

So 2 possibilities here:

1- %rsp is wrong

That would explain why the loaded_vmcs was NULL. However, it is a bit 
harder to understand how it became wrong! It should have been restored 
during the VMEXIT from the HOST_RSP value in the VMCS!

Is this a nested setup?

2- %rbp is wrong

That would also explain why the loaded_vmcs was NULL. Whatever
corrupted the stack that caused loaded_vmcs to be NULL could have also
corrupted the %rbp saved in the stack. That would mean that it happened
during a function call. All function calls that happened between the
point when the stack was sane (just before the "asm" block for
VMLAUNCH) and the crash-site are only kcov related. Looking at kcov, I
can not see where the stack would get corrupted though! Obviously
another source of corruption can be a completely unrelated thread
directly corruption this thread's memory.

Maybe it would be easier to just try to repro it first and see which 
one is true (if at all).

[0] https://syzkaller.appspot.com/bug?extid=cc483201a3c6436d3550

On Thu, 2018-06-28 at 10:18 -0700, Jim Mattson wrote:
>   22: 0f 01 c3              vmresume
>   25: 48 89 4c 24 08        mov    %rcx,0x8(%rsp)
>   2a: 59                    pop    %rcx
> 
> <rip>:
>   2b: 0f 96 81 88 56 00 00 setbe  0x5688(%rcx)
>   32: 48 89 81 00 03 00 00 mov    %rax,0x300(%rcx)
>   39: 48 89 99 18 03 00 00 mov    %rbx,0x318(%rcx)
> 
> %rcx should be pointing to the vcpu_vmx structure, but it's not even
> canonical: 1ffff10035842e78.
> 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B