lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 28 Oct 2011 15:47:43 +0200
From:	Matthijs Kooijman <matthijs@...in.nl>
To:	Tejun Heo <tj@...nel.org>
Cc:	Gaudenz Steinlin <gaudenz@...ian.org>,
	linux-kernel@...r.kernel.org,
	Jameson Graef Rollins <jrollins@...estructure.net>,
	Jonathan Nieder <jrnieder@...il.com>,
	Jens Axboe <axboe@...nel.dk>, Amit Shah <amit.shah@...hat.com>,
	David Zeuthen <zeuthen@...il.com>,
	Martin Pitt <martin.pitt@...ntu.com>
Subject: Re: [regression] CD-ROM polling blocks suspend on some machines
	(Re: [PATCH 1/2] cdrom: always check_disk_change() on open)

Hi Tejun,

> > it seems this same problem also occurs with udisks-daemon, though the
> > behaviour is slightly different (it seems to hang even before
> > undocking, I _think_).
> > 
> > Anyway, here's the dmesg output when trying to suspend:
>
> Can you please turn on CONFIG_FRAME_POINTER, trigger the problem, wait
> for some minutes and then do sysrq-t and report the dmesg output?
> This could be from DEAD state handling race currently being addressed
> in upstream.

I compiled a clean 3.1.0 kernel with FRAME_POINTER support, but haven't
been able to reproduce the problem with udisks-daemon so far. It
actually seems the problem occurs less often with 3.1.0, I initially
thought the problem was fixed.

I'm not sure if the FRAME_POINTER option actually works, though, since I
couldn't find any obvious differences in the format of the stack traces?

I have managed to trigger the problem again in scsi_id (by undocking)
and then suspending. I've attached the full output of sysrq-t (though I
think it's incomplete, syslogd didn't manage to capture it all?). The
part about scsi_id is copied below.

Here's the dmesg output during the suspend:

Oct 28 15:15:55 grubby kernel: [  168.864019] Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
Oct 28 15:15:55 grubby kernel: [  168.864069] scsi_id         D 0000000000000000     0 11343   8992 0x00800004
Oct 28 15:15:55 grubby kernel: [  168.864077]  ffff880225e6f8d8 0000000000000082 00ff880200000000 ffff880200000000
Oct 28 15:15:55 grubby kernel: [  168.864085]  ffff88022d7ed610 ffff880225e6ffd8 ffff880225e6ffd8 0000000000012f40
Oct 28 15:15:55 grubby kernel: [  168.864095]  ffff88022ed59610 ffff88022d7ed610 ffff88020439e440 0000000100000002
Oct 28 15:15:55 grubby kernel: [  168.864098] Call Trace:
Oct 28 15:15:55 grubby kernel: [  168.864107]  [<ffffffff8134c996>] schedule+0x55/0x57
Oct 28 15:15:55 grubby kernel: [  168.864110]  [<ffffffff8134cc2c>] schedule_timeout+0x2f/0xd9
Oct 28 15:15:55 grubby kernel: [  168.864117]  [<ffffffff8104302a>] ? __cond_resched+0x23/0x2f
Oct 28 15:15:55 grubby kernel: [  168.864120]  [<ffffffff8134c832>] wait_for_common+0x9e/0x115
Oct 28 15:15:55 grubby kernel: [  168.864122]  [<ffffffff8104075b>] ? try_to_wake_up+0x1a7/0x1a7
Oct 28 15:15:55 grubby kernel: [  168.864125]  [<ffffffff8134c93f>] wait_for_completion+0x18/0x1a
Oct 28 15:15:55 grubby kernel: [  168.864130]  [<ffffffff8105d659>] flush_work+0x29/0x2f
Oct 28 15:15:55 grubby kernel: [  168.864133]  [<ffffffff8105c8ff>] ? do_work_for_cpu+0x24/0x24
Oct 28 15:15:55 grubby kernel: [  168.864136]  [<ffffffff8105d9e5>] flush_delayed_work+0x3a/0x3e
Oct 28 15:15:55 grubby kernel: [  168.864142]  [<ffffffff811a0156>] disk_clear_events+0x8f/0xf5
Oct 28 15:15:55 grubby kernel: [  168.864169]  [<ffffffff81125280>] check_disk_change+0x29/0x5c
Oct 28 15:15:55 grubby kernel: [  168.864176]  [<ffffffffa009feb1>] cdrom_open+0x44/0x4b2 [cdrom]
Oct 28 15:15:55 grubby kernel: [  168.864181]  [<ffffffff811aba78>] ? kobject_get+0x18/0x20
Oct 28 15:15:55 grubby kernel: [  168.864186]  [<ffffffff8124a895>] ? get_device+0x14/0x1a
Oct 28 15:15:55 grubby kernel: [  168.864191]  [<ffffffffa00bb86b>] sr_block_open+0x90/0xb7 [sr_mod]
Oct 28 15:15:55 grubby kernel: [  168.864194]  [<ffffffff8112610b>] __blkdev_get+0xe2/0x398
Oct 28 15:15:55 grubby kernel: [  168.864197]  [<ffffffff8112667c>] ? blkdev_get+0x2bb/0x2bb
Oct 28 15:15:55 grubby kernel: [  168.864199]  [<ffffffff8112658d>] blkdev_get+0x1cc/0x2bb
Oct 28 15:15:55 grubby kernel: [  168.864201]  [<ffffffff8112667c>] ? blkdev_get+0x2bb/0x2bb
Oct 28 15:15:55 grubby kernel: [  168.864203]  [<ffffffff811266de>] blkdev_open+0x62/0x66
Oct 28 15:15:55 grubby kernel: [  168.864209]  [<ffffffff810fbf12>] __dentry_open+0x182/0x299
Oct 28 15:15:55 grubby kernel: [  168.864212]  [<ffffffff8134db64>] ? _raw_spin_lock+0x9/0xb
Oct 28 15:15:55 grubby kernel: [  168.864215]  [<ffffffff810fcde9>] nameidata_to_filp+0x5b/0x62
Oct 28 15:15:55 grubby kernel: [  168.864219]  [<ffffffff81108e4a>] do_last+0x454/0x56c
Oct 28 15:15:55 grubby kernel: [  168.864221]  [<ffffffff8110905a>] path_openat+0xca/0x30b
Oct 28 15:15:55 grubby kernel: [  168.864226]  [<ffffffff810d4744>] ? handle_mm_fault+0x1c3/0x1d6
Oct 28 15:15:55 grubby kernel: [  168.864228]  [<ffffffff811092e9>] do_filp_open+0x33/0x81
Oct 28 15:15:55 grubby kernel: [  168.864233]  [<ffffffff81112584>] ? alloc_fd+0x6d/0x118
Oct 28 15:15:55 grubby kernel: [  168.864235]  [<ffffffff810fce59>] do_sys_open+0x69/0xfb
Oct 28 15:15:55 grubby kernel: [  168.864238]  [<ffffffff810fcf06>] sys_open+0x1b/0x1d
Oct 28 15:15:55 grubby kernel: [  168.864241]  [<ffffffff81352ed2>] system_call_fastpath+0x16/0x1b
Oct 28 15:15:55 grubby kernel: [  168.864244] 

Here's the sysrq-t output about scsi_id from a few minutes after the
suspend:
Oct 28 15:29:00 grubby kernel: [  951.018877] scsi_id         D 0000000000000000     0 11343   8992 0x00000004
Oct 28 15:29:00 grubby kernel: [  951.018882]  ffff880225e6f8d8 0000000000000082 00ff880200000000 ffff880200000000
Oct 28 15:29:00 grubby kernel: [  951.018889]  ffff88022d7ed610 ffff880225e6ffd8 ffff880225e6ffd8 0000000000012f40
Oct 28 15:29:00 grubby kernel: [  951.018896]  ffff88022ed59610 ffff88022d7ed610 ffff88020439e440 0000000100000002
Oct 28 15:29:00 grubby kernel: [  951.018903] Call Trace:
Oct 28 15:29:00 grubby kernel: [  951.018908]  [<ffffffff8134c996>] schedule+0x55/0x57
Oct 28 15:29:00 grubby kernel: [  951.018913]  [<ffffffff8134cc2c>] schedule_timeout+0x2f/0xd9
Oct 28 15:29:00 grubby kernel: [  951.018919]  [<ffffffff8104302a>] ? __cond_resched+0x23/0x2f
Oct 28 15:29:00 grubby kernel: [  951.018925]  [<ffffffff8134c832>] wait_for_common+0x9e/0x115
Oct 28 15:29:00 grubby kernel: [  951.018930]  [<ffffffff8104075b>] ? try_to_wake_up+0x1a7/0x1a7
Oct 28 15:29:00 grubby kernel: [  951.018936]  [<ffffffff8134c93f>] wait_for_completion+0x18/0x1a
Oct 28 15:29:00 grubby kernel: [  951.018942]  [<ffffffff8105d659>] flush_work+0x29/0x2f
Oct 28 15:29:00 grubby kernel: [  951.018947]  [<ffffffff8105c8ff>] ? do_work_for_cpu+0x24/0x24
Oct 28 15:29:00 grubby kernel: [  951.018953]  [<ffffffff8105d9e5>] flush_delayed_work+0x3a/0x3e
Oct 28 15:29:00 grubby kernel: [  951.018959]  [<ffffffff811a0156>] disk_clear_events+0x8f/0xf5
Oct 28 15:29:00 grubby kernel: [  951.018970]  [<ffffffff81125280>] check_disk_change+0x29/0x5c
Oct 28 15:29:00 grubby kernel: [  951.018982]  [<ffffffffa009feb1>] cdrom_open+0x44/0x4b2 [cdrom]
Oct 28 15:29:00 grubby kernel: [  951.018988]  [<ffffffff811aba78>] ? kobject_get+0x18/0x20
Oct 28 15:29:00 grubby kernel: [  951.018994]  [<ffffffff8124a895>] ? get_device+0x14/0x1a
Oct 28 15:29:00 grubby kernel: [  951.019005]  [<ffffffffa00bb86b>] sr_block_open+0x90/0xb7 [sr_mod]
Oct 28 15:29:00 grubby kernel: [  951.019011]  [<ffffffff8112610b>] __blkdev_get+0xe2/0x398
Oct 28 15:29:00 grubby kernel: [  951.019016]  [<ffffffff8112667c>] ? blkdev_get+0x2bb/0x2bb
Oct 28 15:29:00 grubby kernel: [  951.019021]  [<ffffffff8112658d>] blkdev_get+0x1cc/0x2bb
Oct 28 15:29:00 grubby kernel: [  951.019026]  [<ffffffff8112667c>] ? blkdev_get+0x2bb/0x2bb
Oct 28 15:29:00 grubby kernel: [  951.019031]  [<ffffffff811266de>] blkdev_open+0x62/0x66
Oct 28 15:29:00 grubby kernel: [  951.019036]  [<ffffffff810fbf12>] __dentry_open+0x182/0x299
Oct 28 15:29:00 grubby kernel: [  951.019042]  [<ffffffff8134db64>] ? _raw_spin_lock+0x9/0xb
Oct 28 15:29:00 grubby kernel: [  951.019048]  [<ffffffff810fcde9>] nameidata_to_filp+0x5b/0x62
Oct 28 15:29:00 grubby kernel: [  951.019054]  [<ffffffff81108e4a>] do_last+0x454/0x56c
Oct 28 15:29:00 grubby kernel: [  951.019060]  [<ffffffff8110905a>] path_openat+0xca/0x30b
Oct 28 15:29:00 grubby kernel: [  951.019066]  [<ffffffff810d4744>] ? handle_mm_fault+0x1c3/0x1d6
Oct 28 15:29:00 grubby kernel: [  951.019072]  [<ffffffff811092e9>] do_filp_open+0x33/0x81
Oct 28 15:29:00 grubby kernel: [  951.019078]  [<ffffffff81112584>] ? alloc_fd+0x6d/0x118
Oct 28 15:29:00 grubby kernel: [  951.019084]  [<ffffffff810fce59>] do_sys_open+0x69/0xfb
Oct 28 15:29:00 grubby kernel: [  951.019089]  [<ffffffff810fcf06>] sys_open+0x1b/0x1d
Oct 28 15:29:00 grubby kernel: [  951.019095]  [<ffffffff81352ed2>] system_call_fastpath+0x16/0x1b

It seems these two traces are identical. There are a number of "INFO:
task scsi_id:11343 blocked for more than 120 seconds" traces in the log
as well, which also look identical.

I hope this helps... If you need anything more, just ask.

Gr.

Matthijs

View attachment "sysreq-t.txt" of type "text/plain" (185200 bytes)

Download attachment "signature.asc" of type "application/pgp-signature" (198 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ