linux-kernel - Re: [syzbot] WARNING in handle_mm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXUJOHj8PynvZVWgG7jBe6ZtqKpvjhbUM8perbbydRw5Q@mail.gmail.com>
Date:   Thu, 11 Mar 2021 18:29:45 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     syzbot <syzbot+7d7013084f0a806f3786@...kaller.appspotmail.com>
Cc:     Borislav Petkov <bp@...en8.de>, "H. Peter Anvin" <hpa@...or.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>, X86 ML <x86@...nel.org>
Subject: Re: [syzbot] WARNING in handle_mm_fault

Your warning is odd, but I see the bug.  It's in KVM.

On Thu, Mar 11, 2021 at 4:37 PM syzbot
<syzbot+7d7013084f0a806f3786@...kaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    05a59d79 Merge git://git.kernel.org:/pub/scm/linux/kernel/..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16f493ead00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=750735fdbc630971
> dashboard link: https://syzkaller.appspot.com/bug?extid=7d7013084f0a806f3786
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+7d7013084f0a806f3786@...kaller.appspotmail.com
>
> ------------[ cut here ]------------
> raw_local_irq_restore() called with IRQs enabled
> WARNING: CPU: 0 PID: 8412 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:10
> Modules linked in:
> CPU: 0 PID: 8412 Comm: syz-fuzzer Not tainted 5.12.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:10

The above makes sense, but WTH is the below:

> Code: be ff cc cc cc cc cc cc cc cc cc cc cc 80 3d 11 d1 ad 04 00 74 01 c3 48 c7 c7 20 79 6b 89 c6 05 00 d1 ad 04 01 e8 75 5b be ff <0f> 0b c3 48 39 77 10 0f 84 97 00 00 00 66 f7 47 22 f0 ff 74 4b 48
> RSP: 0000:ffffc9000185fac8 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff8880194268a0 RCX: 0000000000000000
> RBP: 0000000000000200 R08: 0000000000000000 R09: 0000000000000000
> R13: ffffed1003284d14 R14: 0000000000000001 R15: ffff8880b9c36000
> FS:  000000c00002ec90(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> Call Trace:
>  handle_mm_fault+0x1bc/0x7e0 mm/memory.c:4549
> Code: 48 8d 05 97 25 3e 00 48 89 44 24 08 e8 6d 54 ea ff 90 e8 07 a1 ed ff eb a5 cc cc cc cc cc 8b 44 24 10 48 8b 4c 24 08 89 41 24 <c3> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b
> RAX: 00000000000047f6 RBX: 00000000000047f6 RCX: 0000000000d60000
> RDX: 0000000000004c00 RSI: 0000000000d60000 RDI: 000000000181cad0
> RBP: 000000c000301890 R08: 00000000000047f5 R09: 000000000059c5a0
> R10: 000000c0004e2000 R11: 0000000000000020 R12: 00000000000000fa
> R13: 00aaaaaaaaaaaaaa R14: 000000000093f064 R15: 0000000000000038
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 0 PID: 8412 Comm: syz-fuzzer Not tainted 5.12.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:

Now we start reading here:

>  __dump_stack lib/dump_stack.c:79 [inline]
>  dump_stack+0x141/0x1d7 lib/dump_stack.c:120
>  panic+0x306/0x73d kernel/panic.c:231
>  __warn.cold+0x35/0x44 kernel/panic.c:605
>  report_bug+0x1bd/0x210 lib/bug.c:195
>  handle_bug+0x3c/0x60 arch/x86/kernel/traps.c:239
>  exc_invalid_op+0x14/0x40 arch/x86/kernel/traps.c:259
>  asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:575
> RIP: 0010:warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:10
> Code: be ff cc cc cc cc cc cc cc cc cc cc cc 80 3d 11 d1 ad 04 00 74 01 c3 48 c7 c7 20 79 6b 89 c6 05 00 d1 ad 04 01 e8 75 5b be ff <0f> 0b c3 48 39 77 10 0f 84 97 00 00 00 66 f7 47 22 f0 ff 74 4b 48
> RSP: 0000:ffffc9000185fac8 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff8880194268a0 RCX: 0000000000000000
> RDX: ffff88802f7b2400 RSI: ffffffff815b4435 RDI: fffff5200030bf4b
> RBP: 0000000000000200 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff815ad19e R11: 0000000000000000 R12: 0000000000000003
> R13: ffffed1003284d14 R14: 0000000000000001 R15: ffff8880b9c36000
>  kvm_wait arch/x86/kernel/kvm.c:860 [inline]

and there's the bug:

        /*
         * halt until it's our turn and kicked. Note that we do safe halt
         * for irq enabled case to avoid hang when lock info is overwritten
         * in irq spinlock slowpath and no spurious interrupt occur to save us.
         */
        if (arch_irqs_disabled_flags(flags))
                halt();
        else
                safe_halt();

out:
        local_irq_restore(flags);
}

The safe_halt path is bogus.  It should just return instead of
restoring the IRQ flags.

--Andy