linux-kernel - Re: unexpected kernel reboot (3)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+ZCKFXK+9Bw1__ofUBLy6y8mQRoQHm5Qt135mByOrYk8g@mail.gmail.com>
Date:   Wed, 11 Mar 2020 21:17:58 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     syzbot <syzbot+cce9ef2dd25246f815ee@...kaller.appspotmail.com>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Jim Mattson <jmattson@...gle.com>
Subject: Re: unexpected kernel reboot (3)

> On Monday, July 16, 2018 at 12:10:07 PM UTC+2, Dmitry Vyukov wrote:
>>
>> On Fri, Jul 13, 2018 at 11:58 PM, Andrew Morton
>> <akpm@...ux-foundation.org> wrote:
>> > On Fri, 13 Jul 2018 14:39:02 -0700 syzbot <syzbot+cce9ef2dd25246f815ee@...kaller.appspotmail.com> wrote:
>> >
>> >> Hello,
>> >>
>> >> syzbot found the following crash on:
>> >
>> > hm, I don't think I've seen an "unexpected reboot" report before.
>> >
>> > Can you expand on specifically what happened here?  Did the machine
>> > simply magically reboot itself?  Or did an external monitor whack it,
>> > or...
>>
>> We put some user-space workload (not involving reboot syscall), and
>> the machine suddenly rebooted. We don't know what triggered the
>> reboot, we only see the consequences. We've seen few such bugs before,
>> e.g.:
>> https://syzkaller.appspot.com/bug?id=4f1db8b5e7dfcca55e20931aec0ee707c5cafc99
>> Usually it involves KVM. Potentially it's a bug in the outer
>> kernel/VMM, it may or may not be present in tip kernel.
>>
>>
>> > Does this test distinguish from a kernel which simply locks up?
>>
>> Yes. If you look at the log:
>>
>> https://syzkaller.appspot.com/x/log.txt?x=17c6a6d0400000
>>
>> We've booted the machine, started running a program, and them boom! it
>> reboots without any other diagnostics. It's not a hang.
>>
>>
>>
>> >> HEAD commit:    1e4b044d2251 Linux 4.18-rc4
>> >> git tree:       upstream
>> >> console output: https://syzkaller.appspot.com/x/log.txt?x=17c6a6d0400000
>> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=25856fac4e580aa7
>> >> dashboard link: https://syzkaller.appspot.com/bug?extid=cce9ef2dd25246f815ee


This happened 10K+ times.
If GCE VM is rebooted by doing something with KVM subsystem, I assume
it's a GCE bug (?). +Jim

>> >> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> >> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=165012c2400000
>> >> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1571462c400000
>> >
>> > I assume the "C reproducer" is irrelevant here.
>> >
>> > Is it reproducible?
>>
>> Yes, it is reproducible and the C reproducer is relevant.
>> If syzbot provides a reproducer, it means that it booted a clean
>> machine, run the provided program (nothing else besides typical init
>> code and ssh/scp invocation) and that's the kernel output it observed
>> running this exact program.
>> However in this case, the exact setup can be relevant. syzbot uses GCE
>> VMs, it may or may not reproduce with other VMMs/physical hardware,
>> sometimes such bugs depend on exact CPU type.
>>
>>
>> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> >> Reported-by: syzbot+cce9ef2dd25246f815ee@...kaller.appspotmail.com
>> >>
>> >> output_len: 0x00000000092459b0
>> >> kernel_total_size: 0x000000000a505000
>> >> trampoline_32bit: 0x000000000009d000
>> >>
>> >> Decompressing Linux... Parsing ELF... done.
>> >> Booting the kernel.
>> >> [    0.000000] Linux version 4.18.0-rc4+ (syzkaller@ci) (gcc version 8.0.1
>> >> 20180413 (experimental) (GCC)) #138 SMP Mon Jul 9 10:45:11 UTC 2018
>> >> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz root=/dev/sda1
>> >> console=ttyS0 earlyprintk=serial vsyscall=native rodata=n
>> >> ftrace_dump_on_oops=orig_cpu oops=panic panic_on_warn=1 nmi_watchdog=panic
>> >> panic=86400 workqueue.watchdog_thresh=140 kvm-intel.nested=1
>> >>
>> >> ...
>> >>
>> >> regulatory database
>> >> [    4.519364] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
>> >> [    4.520839] platform regulatory.0: Direct firmware load for
>> >> regulatory.db failed with error -2
>> >> [    4.522155] cfg80211: failed to load regulatory.db
>> >> [    4.522185] ALSA device list:
>> >> [    4.523499]   #0: Dummy 1
>> >> [    4.523951]   #1: Loopback 1
>> >> [    4.524389]   #2: Virtual MIDI Card 1
>> >> [    4.825991] input: ImExPS/2 Generic Explorer Mouse as
>> >> /devices/platform/i8042/serio1/input/input4
>> >> [    4.829533] md: Waiting for all devices to be available before autodetect
>> >> [    4.830562] md: If you don't use raid, use raid=noautodetect
>> >> [    4.835237] md: Autodetecting RAID arrays.
>> >> [    4.835882] md: autorun ...
>> >> [    4.836364] md: ... autorun DONE.
>> >
>> > Can we assume that the failure occurred in or immediately after the MD code,
>> > or might some output have been truncated?
>> >
>> > It would be useful to know what the kernel was initializing immediately
>> > after MD.  Do you have a kernel log for the same config when the kerenl
>> > didn't fail?  Or maybe enable initcall_debug?
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@...glegroups.com.
>> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180713145811.683ffd0043cac26a5a5af725%40linux-foundation.org.
>> > For more options, visit https://groups.google.com/d/optout.