linux-kernel - Re: How to debug these general protection fault: 0000 [#1] SMP KASAN PTI issues ?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+bNjbhW2vEGA_M_FwDEhXe5ntR7b1rHMe3xf5pz0Oth+A@mail.gmail.com>
Date:   Mon, 29 Apr 2019 16:24:15 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     JohnD Oracle <johndonoracle@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        syzkaller <syzkaller@...glegroups.com>
Subject: Re: How to debug these general protection fault: 0000 [#1] SMP KASAN
 PTI issues ?

On Fri, Apr 26, 2019 at 8:40 PM JohnD Oracle <johndonoracle@...il.com> wrote:
>
> Hi
>
>
>  I am seeing a number of miss leading information in these reports , and I don't have an adequate understanding how KASAN
> works in order to know to debug it.
>
> For instance;
>
> Lets look at this event :
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
>  general protection fault: 0000 [#1] SMP KASAN PTI
>
> CPU: 0 PID: 2823 Comm: test2 Not tainted 4.14.35.jpd-ksan.01.-syzkaller #22
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
>  task: ffff88805ac89780 task.stack: ffff888054920000
> RIP: 0010:vhost_vsock_dev_release+0x10f/0x450 [vhost_vsock]
> RSP: 0018:ffff888054927be8 EFLAGS: 00010206
> RAX: dffffc0000000000 RBX: 727574616e676973 RCX: 1ffff1100a924f76
> RDX: 0e4eae8c2dcced2e RSI: ffff88805a4a0500 RDI: ffff88807e5e8bb8
> RBP: ffff888054927c38 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffed100a924f3d R11: ffff8880549279ef R12: ffff88807e5e0000
> R13: 00686769685f7265 R14: ffff88807e5e8bc0 R15: ffffffffc04e2a30
> FS:  0000000000000000(0000) GS:ffff88805e400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe5af16c6c8 CR3: 0000000003a0e000 CR4: 00000000000006f0
> Call Trace:
>  ? ima_file_free+0xb6/0x316
>  ? vhost_vsock_dev_open+0x2d0/0x2d0 [vhost_vsock]
> __fput+0x25d/0x775
>   ____fput+0x1a/0x1d
>  task_work_run+0x12e/0x18f
>  do_exit+0x6ee/0x2a5e
>
> Issue number 1 :
>
> The static trace shows vhost_vsock_dev_open() ;   but in reality , we are in a system EXIT call closing open file descriptors because the RIP is:
>
> RIP: 0010:vhost_vsock_dev_release+0x10f
>
> Ok ;  So the stack is dirty with old information, but using GDB, I set a breakpoint at ksan die exception handler that generates the kernel trace message
> that include vhost_vsock_dev_open() :
>
>
> #0  kasan_die_handler (self=0xffffffff83aa8420 <kasan_die_notifier>, val=0x9, data=0xffff888054927a60) at arch/x86/mm/kasan_init_64.c:245
> #1  0xffffffff8120363e in notifier_call_chain (nl=<optimized out>, val=<optimized out>, v=<optimized out>, nr_to_call=0xfffffffb,
>     nr_calls=0x0 <irq_stack_union>) at kernel/notifier.c:93
> #2  0xffffffff81203d2b in __atomic_notifier_call_chain (nr_calls=<optimized out>, nr_to_call=<optimized out>, v=<optimized out>,
>     val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:183
> #3  atomic_notifier_call_chain (v=<optimized out>, val=<optimized out>, nh=<optimized out>) at kernel/notifier.c:193
> #4  notify_die (val=<optimized out>, str=<optimized out>, regs=<optimized out>, err=0x0, trap=0xd, sig=0xb) at kernel/notifier.c:549
> #5  0xffffffff8108997f in do_general_protection (regs=0xffff888054927b38, error_code=0x0) at arch/x86/kernel/traps.c:558
> #6  0xffffffff82e037bc in general_protection () at arch/x86/entry/entry_64.S:1275
> #7  0xffffffffc04e2a30 in vhost_vsock_dev_open (inode=<optimized out>, file=0x0 <irq_stack_union>) at drivers/vhost/vsock.c:526
>
> What am I suppose to believe ?  Are we calling vsock_open()  from the exit  or  vsock_dev_release()   ?
>
>
> What really gets confusing is what the dis-assembling shows around the gdb exception "
>
>   #7  0xffffffffc04e2a30 in vhost_vsock_dev_open (inode=<optimized out>, file=0x0 <irq_stack_union>) at drivers/vhost/vsock.c:526
>
>
> Lets look at   failing instructions around  at vsock_open()  0xffffffffc04e2a30 :
>
>    0xffffffffc04e2a19 <vhost_vsock_dev_open+697>:    callq  0xffffffff816a94c0 <__asan_report_store8_noabort>
>    0xffffffffc04e2a1e <vhost_vsock_dev_open+702>:    jmpq   0xffffffffc04e2848 <vhost_vsock_dev_open+232>
>    0xffffffffc04e2a23 <vhost_vsock_dev_open+707>:    mov    %r12,%rdi
>    0xffffffffc04e2a26 <vhost_vsock_dev_open+710>:    callq  0xffffffff816a94c0 <__asan_report_store8_noabort>
>    0xffffffffc04e2a2b <vhost_vsock_dev_open+715>:    jmpq   0xffffffffc04e286f <vhost_vsock_dev_open+271>
>   0xffffffffc04e2a30 <vhost_vsock_dev_release>:    nopl   0x0(%rax,%rax,1)           <<<<<<<<<<  FAILING  ADDRESS
>
>   How can I be getting a fault on a 5 byte NOP instruction : nopl   0x0(%rax,%rax,1)
>
>   ax is 0xdffffc0000000000;  so it should be a   move register ax to ax.
>
>  I have no idea what the inserted functions :  asan_report_store8_noabort()  are;  They don't appear in the kernel source that I can find.
>
> Lets look at  the RIP from ksan die() :
>
>
> 353    void die(const char *str, struct pt_regs *regs, long err)
> 354    {
> 355        unsigned long flags = oops_begin();
> 356        int sig = SIGSEGV;
> 357
> 358        if (__die(str, regs, err))
> (gdb) p *regs
> $1 = {
>   r15 = 0xffffffffc04e2a30,
>   r14 = 0xffff888053a38bc0,
>   r13 = 0x483a750000002825,
>   r12 = 0xffff888053a30000,
>   bp = 0xffff888056347c38,
>   bx = 0x415c415d5b48c483,
>   r11 = 0xffff8880563479ef,
>   r10 = 0xffffed100ac68f3d,
>   r9 = 0x0,
>   r8 = 0x0,
>   ax = 0xdffffc0000000000,
>   cx = 0x1ffff1100ac68f76,
>   dx = 0x82b882bab691890,
>   si = 0xffff88807eef5b40,
>   di = 0xffff888053a38bb8,
>   orig_ax = 0xffffffffffffffff,
>   ip = 0xffffffffc04e2b3f,
>   cs = 0x10,
>   flags = 0x10206,
>   sp = 0xffff888056347be8,
>   ss = 0x18
> }
>
> the regs.ip  ==   0xffffffffc04e2b3f
>    0xffffffffc04e2b3f <vhost_vsock_dev_release+271>:    cmpb   $0x0,(%rdx,%rax,1)
>    0xffffffffc04e2b43 <vhost_vsock_dev_release+275>:    jne    0xffffffffc04e2e50 <vhost_vsock_dev_release+1056>
> (gdb)
>
>
>
> (gdb) x/2i 0xffffffffc04e2b3f
>    0xffffffffc04e2b3f <vhost_vsock_dev_release+271>:    cmpb   $0x0,(%rdx,%rax,1)
>    0xffffffffc04e2b43 <vhost_vsock_dev_release+275>:    jne    0xffffffffc04e2e50 <vhost_vsock_dev_release+1056>
>
>
> I can believe getting a fault doing a cmpb   (dx,ax )
>
> Is this a complex instruction comparing *dx to *ax ?  the the register contents ?
>
>
> ax =
>
> (gdb) x/2b 0xdffffc0000000000
> 0xdffffc0000000000:    Cannot access memory at address 0xdffffc0000000000
>
> dx =
>
> (gdb) x/2b 0x82b882bab691890
> 0x82b882bab691890:    Cannot access memory at address 0x82b882bab691890
>
>
> Debugging suggestions welcome !
>
> JD
>
> ===
>
>
> Attached is the test case.
your info,

Hi JohnD,

Are you debugging some syzbot-reported bug? Which one? It's useful to
keep this in the same email thread as the report in somebody else will
look at it later (or maybe debugging the same at the same time).
syzkaller-bugs@ is generally not read by anyone and is only CCed in
syzbot reports. We should direct people to syzkaller@ mailing list
everywhere.

Looking at the info, it seems that the crash happens in
vhost_vsock_dev_release and gdb improperly unwinds kernel (maybe you
need a latest gdb or something).
__fput generally calls callbacks that close/release/destroy something.
And it seems that vhost_vsock_dev_release perfectly matches this
definition.
Also "cmpb   $0x0,(%rdx,%rax,1)" looks like KASAN shadow check and is
the instruction where NULL derefs usually caught.
This instruction does "if (*(byte*)(rax + rdx) == 0)". Perhaps you
better turn off CONFIG_KASAN to make debugging simpler. Since it's not
KASAN-detected crash, there is no point enabling KASAN.

View attachment "test2.c" of type "text/x-csrc" (4359 bytes)