linux-kernel - Re: INFO: task hung in sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+bwnyFmgTNMTa1p8WKecH=OU5Za_hboY7Q=V2Aq+DOsKQ@mail.gmail.com>
Date:   Thu, 8 Feb 2018 15:18:11 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Jan Kara <jack@...e.cz>
Cc:     Andi Kleen <ak@...ux.intel.com>,
        syzbot <syzbot+283c3c447181741aea28@...kaller.appspotmail.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>, jlayton@...hat.com,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Ingo Molnar <mingo@...nel.org>, rgoldwyn@...e.com,
        syzkaller-bugs@...glegroups.com, linux-fsdevel@...r.kernel.org
Subject: Re: INFO: task hung in sync_blockdev

On Thu, Feb 8, 2018 at 3:08 PM, Jan Kara <jack@...e.cz> wrote:
> On Thu 08-02-18 14:28:08, Dmitry Vyukov wrote:
>> On Thu, Feb 8, 2018 at 10:28 AM, Jan Kara <jack@...e.cz> wrote:
>> > On Wed 07-02-18 07:52:29, Andi Kleen wrote:
>> >> >  #0:  (&bdev->bd_mutex){+.+.}, at: [<0000000040269370>]
>> >> > __blkdev_put+0xbc/0x7f0 fs/block_dev.c:1757
>> >> > 1 lock held by blkid/19199:
>> >> >  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
>> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
>> >> >  #1:  (&ldata->atomic_read_lock){+.+.}, at: [<0000000033edf9f2>]
>> >> > n_tty_read+0x2ef/0x1a00 drivers/tty/n_tty.c:2131
>> >> > 1 lock held by syz-executor5/19330:
>> >> >  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
>> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
>> >> > 1 lock held by syz-executor5/19331:
>> >> >  #0:  (&bdev->bd_mutex){+.+.}, at: [<00000000b4dcaa18>]
>> >> > __blkdev_get+0x158/0x10e0 fs/block_dev.c:1439
>> >>
>> >> It seems multiple processes deadlocked on the bd_mutex.
>> >> Unfortunately there's no backtrace for the lock acquisitions,
>> >> so it's hard to see the exact sequence.
>> >
>> > Well, all in the report points to a situation where some IO was submitted
>> > to the block device and never completed (more exactly it took longer than
>> > those 120s to complete that IO). It would need more digging into the
>> > syzkaller program to find out what kind of device that was and possibly why
>> > the IO took so long to complete...
>>
>>
>> Would a traceback of all task stacks help in this case?
>> What I've seen in several "task hung" reports is that the CPU
>> traceback is not showing anything useful. So perhaps it should be
>> changed to task traceback? Or it would not help either?
>
> Task stack traceback for all tasks (usually only tasks in D state - i.e.
> sysrq-w - are enough actually) would definitely help for debugging
> deadlocks on sleeping locks. For this particular case I'm not sure if it
> would help or not since it is quite possible the IO is just sitting in some
> queue never getting processed

That's what I was afraid of.

> due to some racing syzkaller process tearing
> down the device in the wrong moment or something like that... Such case is
> very difficult to debug without full kernel crashdump of the hung kernel
> (or a reproducer for that matter) and even with that it is usually rather
> time consuming. But for the deadlocks which do occur more frequently it
> would be probably worth the time so it would be nice if such option was
> eventually available.

By "full kernel crashdump" you mean kdump thing, or something else?