linux-kernel - Re: upstream boot error: BUG: soft lockup in __do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+avpJJdHBg2nKJ7CUON-8q9bqSnrAM=gHMJGVhvSrmnDw@mail.gmail.com>
Date:   Fri, 31 Jul 2020 18:21:05 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Randy Dunlap <rdunlap@...radead.org>
Cc:     syzbot <syzbot+8472ea265fe32cc3bf78@...kaller.appspotmail.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Ingo Molnar <mingo@...hat.com>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: upstream boot error: BUG: soft lockup in __do_softirq

On Fri, Jul 31, 2020 at 6:08 PM Randy Dunlap <rdunlap@...radead.org> wrote:
>
> On 7/30/20 11:50 PM, Dmitry Vyukov wrote:
> > On Fri, Jul 31, 2020 at 8:44 AM syzbot
> > <syzbot+8472ea265fe32cc3bf78@...kaller.appspotmail.com> wrote:
> >>
> >> Hello,
> >>
> >> syzbot found the following issue on:
> >>
> >> HEAD commit:    92ed3019 Linux 5.8-rc7
> >> git tree:       upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=10e84cdf100000
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=b45e47f6d958ae82
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=8472ea265fe32cc3bf78
> >> compiler:       gcc (GCC) 10.1.0-syz 20200507
> >>
> >> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >> Reported-by: syzbot+8472ea265fe32cc3bf78@...kaller.appspotmail.com
> >
> > This is a qemu-kvm instance killing the host kernel somehow, the host
> > kernel itself running qemu's is full of rcu stalls. I think this is
> > not a bug in the tested kernel.
> > We change rcu stall timeout to 120 seconds from the default 21s, but
> > this happens only after boot using sysctls. I did not find any way to
> > change the rcu timeout via cmdline/config (would be useful).
>
> (adding Paul)
>
>
> Documentation/RCU/stallwarn.rst says there is a Kconfig:
>
> CONFIG_RCU_CPU_STALL_TIMEOUT
>
>         This kernel configuration parameter defines the period of time
>         that RCU will wait from the beginning of a grace period until it
>         issues an RCU CPU stall warning.  This time period is normally
>         21 seconds.
>
> and Documentation/admin-guide/kernel-parameters.txt has 2 RCU stall timeouts,
> one for CPU and one for tasks:
>
>         rcupdate.rcu_cpu_stall_timeout= [KNL]
>                         Set timeout for RCU CPU stall warning messages.
>
>         rcupdate.rcu_task_stall_timeout= [KNL]
>                         Set timeout in jiffies for RCU task stall warning
>                         messages.  Disable with a value less than or equal
>                         to zero.

Hi Randy,

Thanks for looking into this.
But I think I messed things up.  The config  has
CONFIG_RCU_CPU_STALL_TIMEOUT=100, but this is not an RCU stall:

watchdog: BUG: soft lockup - CPU#3 stuck for 21s! [grep:4749]

This is what is controlled by kernel.watchdog_thresh sysctl (?).