linux-kernel - [BUG] Deadlock between cpufreq kondemand and process cpuspeed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-Id: <1246867583.2560.526.camel@ymzhang>
Date:	Mon, 06 Jul 2009 16:06:23 +0800
From:	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
To:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc:	Pallipadi@...r.kernel.org,
	Venkatesh <venkatesh.pallipadi@...el.com>,
	LKML <linux-kernel@...r.kernel.org>, cpufreq@...r.kernel.org,
	Dave Jones <davej@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: [BUG] Deadlock between cpufreq kondemand and process cpuspeed

When I run a series of testing against 2.6.31-rc2 on my Tulsa x8664 machine,
kernel reports a block issue. Below is the log.


kondemand/3   D ffffffff81029ff9     0  1474      2 0x00000000
 ffff88027ccb8650 0000000000000046 0000000000000000 0000000000000000
 ffff88027ed51310 ffff88027ccb88e0 00000000ffff0b68 0000000000000000
 0000000000000000 ffffffff81029b34 0000000000000000 0000000100000000
Call Trace:
 [<ffffffff81029b34>] ? clear_buddies+0x15/0x23
 [<ffffffff8130734c>] ? do_dbs_timer+0x0/0x272
 [<ffffffff813cfa94>] ? __down_write_nested+0x79/0x93
 [<ffffffff81305ab7>] ? lock_policy_rwsem_write+0x3e/0x6a
 [<ffffffff813073a9>] ? do_dbs_timer+0x5d/0x272
 [<ffffffff8130734c>] ? do_dbs_timer+0x0/0x272
 [<ffffffff81049c77>] ? worker_thread+0x15b/0x1f5
 [<ffffffff8104cd0a>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffff81049b1c>] ? worker_thread+0x0/0x1f5
 [<ffffffff8104cbfd>] ? kthread+0x84/0x8f
 [<ffffffff81048178>] ? __call_usermodehelper+0x0/0x66
 [<ffffffff8100bcfa>] ? child_rip+0xa/0x20
 [<ffffffff8104cb79>] ? kthread+0x0/0x8f
 [<ffffffff8100bcf0>] ? child_rip+0x0/0x20
INFO: task cpuspeed:4445 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cpuspeed      D ffffffff81029ff9     0  4445   4370 0x00000000
 ffff88027ceff910 0000000000000086 0000000000000000 000280da00000010
 ffff88027eea3850 ffff88027ceffba0 00000000ffff0b49 00000000280c3300
 0000000000000005 ffffffff8102f3b8 0000000000000000 0000000100000000
Call Trace:
 [<ffffffff8102f3b8>] ? find_busiest_group+0x1e6/0x831
 [<ffffffff813ce952>] ? schedule_timeout+0x23/0x161
 [<ffffffff8107ebbc>] ? get_page_from_freelist+0x414/0x58b
 [<ffffffff8118d7f8>] ? number+0x118/0x204
 [<ffffffff813ce804>] ? wait_for_common+0xb8/0x11e
 [<ffffffff81031ebe>] ? default_wake_function+0x0/0x9
 [<ffffffff8102ba4d>] ? __wake_up+0x30/0x44
 [<ffffffff810491de>] ? __cancel_work_timer+0x107/0x160
 [<ffffffff8118d9d2>] ? vsscanf+0xee/0x657
 [<ffffffff81048e80>] ? wq_barrier_func+0x0/0x9
 [<ffffffff813077ef>] ? cpufreq_governor_dbs+0x231/0x29d
 [<ffffffff8118cc23>] ? __up_read+0x13/0x8a
 [<ffffffff81304f0c>] ? __cpufreq_governor+0x98/0xcf
 [<ffffffff813050af>] ? __cpufreq_set_policy+0x16c/0x1fb
 [<ffffffff81305638>] ? store_scaling_governor+0x170/0x1b9
 [<ffffffff81305bdf>] ? handle_update+0x0/0x28
 [<ffffffff813062b5>] ? store+0x4d/0x6c
 [<ffffffff810ec748>] ? sysfs_write_file+0xd2/0x110
 [<ffffffff810a84ce>] ? vfs_write+0xad/0x149
 [<ffffffff810a8a16>] ? sys_write+0x45/0x6e
 [<ffffffff8100ad6b>] ? system_call_fastpath+0x16/0x1b


As a matter of fact, it's triggerred by "/bin/sh /etc/init.d/cpuspeed stop".

Consider below call chain of process cpuspeed:
store (get lock by lock_policy_rwsem_write) =>
	store_scaling_governor =>
		__cpufreq_set_policy =>
			__cpufreq_governor =>
				cpufreq_governor_dbs =>
					dbs_timer_exit =>
						cancel_delayed_work_sync =>
							wait_on_cpu_work
go to sleep.

While thread kondemand is trying to execute the work at do_dbs_timer and
sleep at lock_policy_rwsem_write, because cpuspeed got the lock and is
sleeping now.

The race is cpuspeed get the lock firstly, and kondemand sleeps on the lock
when trying to process the work. But cpuspeed waits for the work completion.

It seems b14893a62c73af0eca414cfed505b8c09efc613c causes it.

yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/