[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACT4Y+Y5QSbgiLhPuv+1vKfSjanF0_p80Pr6GVWPuVwwEzSExA@mail.gmail.com>
Date: Fri, 31 Jul 2020 18:23:05 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Randy Dunlap <rdunlap@...radead.org>
Cc: syzbot <syzbot+8472ea265fe32cc3bf78@...kaller.appspotmail.com>,
Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>,
LKML <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
Thomas Gleixner <tglx@...utronix.de>,
"the arch/x86 maintainers" <x86@...nel.org>,
"Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: upstream boot error: BUG: soft lockup in __do_softirq
On Fri, Jul 31, 2020 at 6:21 PM Dmitry Vyukov <dvyukov@...gle.com> wrote:
>
> On Fri, Jul 31, 2020 at 6:08 PM Randy Dunlap <rdunlap@...radead.org> wrote:
> >
> > On 7/30/20 11:50 PM, Dmitry Vyukov wrote:
> > > On Fri, Jul 31, 2020 at 8:44 AM syzbot
> > > <syzbot+8472ea265fe32cc3bf78@...kaller.appspotmail.com> wrote:
> > >>
> > >> Hello,
> > >>
> > >> syzbot found the following issue on:
> > >>
> > >> HEAD commit: 92ed3019 Linux 5.8-rc7
> > >> git tree: upstream
> > >> console output: https://syzkaller.appspot.com/x/log.txt?x=10e84cdf100000
> > >> kernel config: https://syzkaller.appspot.com/x/.config?x=b45e47f6d958ae82
> > >> dashboard link: https://syzkaller.appspot.com/bug?extid=8472ea265fe32cc3bf78
> > >> compiler: gcc (GCC) 10.1.0-syz 20200507
> > >>
> > >> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > >> Reported-by: syzbot+8472ea265fe32cc3bf78@...kaller.appspotmail.com
> > >
> > > This is a qemu-kvm instance killing the host kernel somehow, the host
> > > kernel itself running qemu's is full of rcu stalls. I think this is
> > > not a bug in the tested kernel.
> > > We change rcu stall timeout to 120 seconds from the default 21s, but
> > > this happens only after boot using sysctls. I did not find any way to
> > > change the rcu timeout via cmdline/config (would be useful).
> >
> > (adding Paul)
> >
> >
> > Documentation/RCU/stallwarn.rst says there is a Kconfig:
> >
> > CONFIG_RCU_CPU_STALL_TIMEOUT
> >
> > This kernel configuration parameter defines the period of time
> > that RCU will wait from the beginning of a grace period until it
> > issues an RCU CPU stall warning. This time period is normally
> > 21 seconds.
> >
> > and Documentation/admin-guide/kernel-parameters.txt has 2 RCU stall timeouts,
> > one for CPU and one for tasks:
> >
> > rcupdate.rcu_cpu_stall_timeout= [KNL]
> > Set timeout for RCU CPU stall warning messages.
> >
> > rcupdate.rcu_task_stall_timeout= [KNL]
> > Set timeout in jiffies for RCU task stall warning
> > messages. Disable with a value less than or equal
> > to zero.
>
> Hi Randy,
>
> Thanks for looking into this.
> But I think I messed things up. The config has
> CONFIG_RCU_CPU_STALL_TIMEOUT=100, but this is not an RCU stall:
>
> watchdog: BUG: soft lockup - CPU#3 stuck for 21s! [grep:4749]
>
> This is what is controlled by kernel.watchdog_thresh sysctl (?).
And there is actually a cmdline parameter for this:
static int __init watchdog_thresh_setup(char *str)
{
get_option(&str, &watchdog_thresh);
return 1;
}
__setup("watchdog_thresh=", watchdog_thresh_setup);
I will write it down somewhere.
Powered by blists - more mailing lists