linux-kernel - Re: BUG: workqueue lockup (2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1712031547010.2199@nanos>
Date:   Sun, 3 Dec 2017 15:48:14 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Dmitry Vyukov <dvyukov@...gle.com>
cc:     syzbot 
        <bot+e38be687a2450270a3b593bacb6b5795a7a74edb@...kaller.appspotmail.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Philippe Ombredanne <pombredanne@...b.com>,
        syzkaller-bugs@...glegroups.com
Subject: Re: BUG: workqueue lockup (2)

On Sun, 3 Dec 2017, Dmitry Vyukov wrote:

> On Sun, Dec 3, 2017 at 3:31 PM, syzbot
> <bot+e38be687a2450270a3b593bacb6b5795a7a74edb@...kaller.appspotmail.com>
> wrote:
> > Hello,
> >
> > syzkaller hit the following crash on
> > 2db767d9889cef087149a5eaa35c1497671fa40f
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> >
> > Unfortunately, I don't have any reproducer for this bug yet.
> >
> >
> > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 48s!
> > BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 47s!
> > Showing busy workqueues and worker pools:
> > workqueue events: flags=0x0
> >   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=4/256
> >     pending: perf_sched_delayed, vmstat_shepherd, jump_label_update_timeout,
> > cache_reap
> > workqueue events_power_efficient: flags=0x80
> >   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=4/256
> >     pending: neigh_periodic_work, neigh_periodic_work, do_cache_clean,
> > reg_check_chans_work
> > workqueue mm_percpu_wq: flags=0x8
> >   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
> >     pending: vmstat_update
> > workqueue writeback: flags=0x4e
> >   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256
> >     in-flight: 3401:wb_workfn
> > workqueue kblockd: flags=0x18
> >   pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256
> >     pending: blk_mq_timeout_work
> > pool 4: cpus=0-1 flags=0x4 nice=0 hung=0s workers=11 idle: 3423 4249 92 21
> 
> 
> This error report does not look actionable. Perhaps if code that
> detect it would dump cpu/task stacks, it would be actionable.

That might be related to the RCU stall issue we are chasing, where a timer
does not fire for yet unknown reasons. We have a reproducer now and
hopefully a solution in the next days.

Thanks,

	tglx