linux-kernel - Re: BUG: workqueue lockup (2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+ZbE5=yeb=3hL8KDpPLarHJgihsTb6xX2+4fnoLFuBTow@mail.gmail.com>
Date:   Tue, 19 Dec 2017 15:40:43 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     syzbot 
        <bot+e38be687a2450270a3b593bacb6b5795a7a74edb@...kaller.appspotmail.com>,
        syzkaller-bugs@...glegroups.com,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Philippe Ombredanne <pombredanne@...b.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: BUG: workqueue lockup (2)

On Tue, Dec 19, 2017 at 3:27 PM, Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
> syzbot wrote:
>>
>> syzkaller has found reproducer for the following crash on
>> f3b5ad89de16f5d42e8ad36fbdf85f705c1ae051
>
> "BUG: workqueue lockup" is not a crash.

Hi Tetsuo,

What is the proper name for all of these collectively?


>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>> C reproducer is attached
>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> for information about syzkaller reproducers
>>
>>
>> BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 37s!
>> BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=-20 stuck for 32s!
>> Showing busy workqueues and worker pools:
>> workqueue events: flags=0x0
>>    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
>>      pending: cache_reap
>> workqueue events_power_efficient: flags=0x80
>>    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
>>      pending: neigh_periodic_work, do_cache_clean
>> workqueue mm_percpu_wq: flags=0x8
>>    pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
>>      pending: vmstat_update
>> workqueue kblockd: flags=0x18
>>    pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256
>>      pending: blk_timeout_work
>
> You gave up too early. There is no hint for understanding what was going on.
> While we can observe "BUG: workqueue lockup" under memory pressure, there is
> no hint like SysRq-t and SysRq-m. Thus, I can't tell something is wrong.

Do you know how to send them programmatically? I tried to find a way
several times, but failed. Articles that I've found talk about
pressing some keys that don't translate directly to us-ascii.

But you can also run the reproducer. No report can possible provide
all possible useful information, sometimes debugging boils down to
manually adding printfs. That's why syzbot aims at providing a
reproducer as the ultimate source of details. Also since a developer
needs to test a proposed fix, it's easier to start with the reproducer
right away.


> At least you need to confirm that lockup lasts for a few minutes. Otherwise,

Is it possible to increase the timeout? How? We could bump it up to 2 minutes.


> this might be just overstressing. (According to repro.c , 12 threads are
> created and soon SEGV follows? According to above message, only 2 CPUs?
> Triggering SEGV suggests memory was low due to saving coredump?)