linux-kernel - Re: hung task when USB storage probe occurs during suspend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAC5F_1kp-GiDO=gLv6gHCDrc512=XDgPU4dHvLP3Xz_iHGSggQ@mail.gmail.com>
Date:	Mon, 15 Oct 2012 22:59:58 -0400
From:	Michael Spang <spang@...gle.com>
To:	linux-kernel@...r.kernel.org
Cc:	Jens Axboe <axboe@...nel.dk>, "Rafael J. Wysocki" <rjw@...k.pl>,
	James Bottomley <James.Bottomley@...senpartnership.com>,
	Tejun Heo <tejun@...gle.com>,
	Derek Basehore <dbasehore@...gle.com>
Subject: Re: hung task when USB storage probe occurs during suspend

[oops, fixing LKML address]

If you plug in a USB storage device and then suspend, resume, and
quickly suspend again, the system may freeze. 2 minutes later you'll
get the following message.

I believe this is a regression introduced in 62d3c543 ("Block: use a
freezable workqueue for disk-event polling"). Reverting that patch
prevents the deadlock.

<3>[  240.107877] INFO: task kworker/u:2:64 blocked for more than 120 seconds.
<3>[  240.107888] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
<6>[  240.107899] kworker/u:2     D ffff880149f61908     0    64
2 0x00000000
<5>[  240.107914]  ffff880149f719f0 0000000000000046 ffff8801495e3250
ffff8801495e3250
<5>[  240.107923]  ffff880149f615d8 ffff880149f61590 ffff880149f71fd8
ffff880149f71fd8
<5>[  240.107931]  0000000000012180 ffff880149f61590 ffff88014fa92208
7fffffffffffffff
<5>[  240.107939] Call Trace:
<5>[  240.107955]  [<ffffffff8145d819>] schedule+0x64/0x66
<5>[  240.107966]  [<ffffffff8145c45d>] schedule_timeout+0x34/0xde
<5>[  240.107973]  [<ffffffff8145d681>] wait_for_common+0xcd/0x14b
<5>[  240.107983]  [<ffffffff810524c0>] ? try_to_wake_up+0x1e0/0x1e0
<5>[  240.107990]  [<ffffffff8145d7b3>] wait_for_completion+0x1d/0x1f
<5>[  240.107998]  [<ffffffff81040b67>] flush_work+0x2e/0x34
<5>[  240.108004]  [<ffffffff8103f726>] ? do_work_for_cpu+0x27/0x27
<5>[  240.108011]  [<ffffffff81040f09>] flush_delayed_work+0x49/0x4e
<5>[  240.108019]  [<ffffffff811cc420>] disk_clear_events+0x97/0xfb
<5>[  240.108029]  [<ffffffff810fe7c5>] check_disk_change+0x2d/0x5f
<5>[  240.108039]  [<ffffffff812e5673>] sd_open+0xb1/0x160
<5>[  240.108047]  [<ffffffff810fefc5>] __blkdev_get+0xbf/0x3b0
<5>[  240.108054]  [<ffffffff810ff495>] blkdev_get+0x1df/0x2d8
<5>[  240.108064]  [<ffffffff810e7245>] ? unlock_new_inode+0x5c/0x61
<5>[  240.108074]  [<ffffffff812c268f>] ? put_device+0x17/0x19
<5>[  240.108083]  [<ffffffff811cae3b>] ? disk_put_part+0x12/0x14
<5>[  240.108089]  [<ffffffff811cbcd1>] add_disk+0x29f/0x3e6
<5>[  240.108096]  [<ffffffff812e7e66>] sd_probe_async+0x124/0x1c4
<5>[  240.108103]  [<ffffffff8104b7a9>] ? async_schedule+0x17/0x17
<5>[  240.108108]  [<ffffffff8104b84b>] async_run_entry_fn+0xa2/0x153
<5>[  240.108115]  [<ffffffff81041300>] process_one_work+0x199/0x2b8
<5>[  240.108123]  [<ffffffff81041d80>] worker_thread+0x13c/0x222
<5>[  240.108130]  [<ffffffff81041c44>] ? manage_workers.isra.26+0x171/0x171
<5>[  240.108138]  [<ffffffff81045709>] kthread+0x8b/0x93
<5>[  240.108147]  [<ffffffff8145fcd4>] kernel_thread_helper+0x4/0x10
<5>[  240.108155]  [<ffffffff8104567e>] ? __init_kthread_worker+0x39/0x39
<5>[  240.108163]  [<ffffffff8145fcd0>] ? gs_change+0xb/0xb
<0>[  240.108169] Kernel panic - not syncing: hung_task: blocked tasks

This async SCSI probe is stuck trying to flush the
system_nrt_freezable_wq workqueue, which is frozen.

<6>[  169.464976] powerd_suspend  D ffff880101d2e818     0  5687
5686 0x00000000
<5>[  169.464981]  ffff880101f2bcd8 0000000000000082 0000000000000246
ffffffff81813020
<5>[  169.464988]  0000000000000246 ffff880101d2e4a0 ffff880101f2bfd8
ffff880101f2bfd8
<5>[  169.464996]  0000000000012180 ffff880101d2e4a0 ffff880101f2bcc8
0000000000000217
<5>[  169.465002] Call Trace:
<5>[  169.465009]  [<ffffffff812e1507>] ? scsi_bus_resume_common+0x8d/0x8d
<5>[  169.465015]  [<ffffffff8145d819>] schedule+0x64/0x66
<5>[  169.465023]  [<ffffffff8104b598>]
async_synchronize_cookie_domain+0xb6/0x112
<5>[  169.465029]  [<ffffffff81045df3>] ? __init_waitqueue_head+0x32/0x32
<5>[  169.465038]  [<ffffffff8104b609>] async_synchronize_cookie+0x15/0x17
<5>[  169.465046]  [<ffffffff8104b620>] async_synchronize_full+0x15/0x31
<5>[  169.465052]  [<ffffffff812e1524>] scsi_bus_prepare+0x1d/0x36
<5>[  169.465059]  [<ffffffff812cc804>] dpm_prepare+0xdd/0x18d
<5>[  169.465065]  [<ffffffff812cc8c9>] dpm_suspend_start+0x15/0x40
<5>[  169.465073]  [<ffffffff8105c1a0>] suspend_devices_and_enter+0x78/0x27f
<5>[  169.465081]  [<ffffffff8105c4db>] pm_suspend+0x134/0x1a9
<5>[  169.465088]  [<ffffffff8105b7b8>] state_store+0x9c/0xc5
<5>[  169.465098]  [<ffffffff811d740f>] kobj_attr_store+0x17/0x19
<5>[  169.465105]  [<ffffffff8112419e>] sysfs_write_file+0x104/0x140
<5>[  169.465111]  [<ffffffff810d3b2f>] vfs_write+0xa8/0xcf
<5>[  169.465117]  [<ffffffff810d3cdd>] sys_write+0x4a/0x71
<5>[  169.465124]  [<ffffffff8145ec52>] system_call_fastpath+0x16/0x1b

The powerd_suspend task is blocked waiting for SCSI probes to complete.

We've worked around this issue in the chromium tree by partially
reverting 62d3c543 (see
https://gerrit.chromium.org/gerrit/#/c/35324/).

Thanks,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/