[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E648DDA.8080605@linux.vnet.ibm.com>
Date: Mon, 05 Sep 2011 14:22:42 +0530
From: "Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To: "Rafael J. Wysocki" <rjw@...k.pl>
CC: Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
Linux PM mailing list <linux-pm@...ts.linux-foundation.org>,
oleg@...hat.com, arnd@...db.de
Subject: [BUG] CPU hotplug, freezer: Freezing of tasks failed after 20.00
seconds
On 08/20/2011 02:30 AM, Rafael J. Wysocki wrote:
> On Friday, August 19, 2011, Tejun Heo wrote:
>> Hello,
>>
>> The freezer code has developed a number of convolutions and bugs.
>> It's now using five per-task flags - TIF_FREEZE, PF_FREEZING,
>> PF_NOFREEZE, PF_FROZEN, PF_FREEZER_SKIP and PF_FREEZER_NOSIG, and at
>> the same time has quite a few race conditions. PF_NOFREEZE
>> modifications can race against PM freezer, cgroup_freezer can race
>> against PM freezer, and so on.
>>
>> This patchset tries to simplify the freezer implementation and fix the
>> various bugs. It makes the synchronization more straight forward and
>> replaces TIF_FREEZE with directly checking freeze conditions which are
>> in effect, which makes the whole thing much saner.
>>
>> This patchset removes TIF_FREEZE and PF_FREEZING. Also,
>> PF_FREEZER_SKIP users are planned to move away from the flag and will
>> be removed. It contains the following 16 patches.
>>
>> 0001-freezer-fix-current-state-restoration-race-in-refrig.patch
>> 0002-freezer-don-t-unnecessarily-set-PF_NOFREEZE-explicit.patch
>> 0003-freezer-unexport-refrigerator-and-update-try_to_free.patch
>> 0004-freezer-implement-and-use-kthread_freezable_should_s.patch
>> 0005-freezer-rename-thaw_process-to-__thaw_task-and-simpl.patch
>> 0006-freezer-make-exiting-tasks-properly-unfreezable.patch
>> 0007-freezer-don-t-distinguish-nosig-tasks-on-thaw.patch
>> 0008-freezer-use-dedicated-lock-instead-of-task_lock-memo.patch
>> 0009-freezer-make-freezing-indicate-freeze-condition-in-e.patch
>> 0010-freezer-fix-set_freezable-_with_signal-race.patch
>> 0011-freezer-kill-PF_FREEZING.patch
>> 0012-freezer-clean-up-freeze_processes-failure-path.patch
>> 0013-cgroup_freezer-prepare-for-removal-of-TIF_FREEZE.patch
>> 0014-freezer-make-freezing-test-freeze-conditions-in-effe.patch
>> 0015-freezer-remove-now-unused-TIF_FREEZE.patch
>> 0016-freezer-remove-should_send_signal-and-update-frozen.patch
>>
>> This patchset is on top of the current linus#master (01b883358b "Merge
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc") and
>> available in the following git branch.
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer
>
Hi,
I was testing out Tejun's above mentioned freezer patchset in different scenarios.
While running CPU hot-plug stress test and kernel compilation in the background
and simultaneously testing the suspend infrastructure using the pm_test framework
(at the freezer level), after a few minutes, it reported failure to freeze
tasks within 20 seconds.
This could be a CPU hotplug issue too, since a "possible circular locking dependency
detected" warning was encountered, some time before task freezing failure was hit.
Here is a an excerpt of the log:
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.058681] Freezing of tasks failed after 20.01 seconds (2 tasks refusing to freeze, wq_busy=0):
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.067717] invert_cpu_stat D 0000000000000000 5304 20435 17329 0x00000084
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.074901] ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.082551] ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.090199] ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.097847] Call Trace:
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.100383] [<ffffffff81532de5>] schedule_timeout+0x235/0x320
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.106308] [<ffffffff810a8630>] ? __lock_acquired+0x280/0x2f0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.112315] [<ffffffff8153292c>] ? wait_for_common+0x3c/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.118227] [<ffffffff81532a03>] ? wait_for_common+0x113/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.124224] [<ffffffff81532a0b>] wait_for_common+0x11b/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.130052] [<ffffffff81064de0>] ? try_to_wake_up+0x300/0x300
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.135969] [<ffffffff8107d64a>] ? mod_timer+0x15a/0x2c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.141445] [<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.147445] [<ffffffff81364486>] _request_firmware+0x156/0x2c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.153447] [<ffffffff81364686>] request_firmware+0x16/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.159190] [<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.166318] [<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.173354] [<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.180137] [<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.186135] [<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.192572] [<ffffffff8106d000>] __cpu_notify+0x20/0x40
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.197965] [<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.203011] [<ffffffff8152d07b>] cpu_up+0xd9/0xec
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.207884] [<ffffffff8151e599>] store_online+0x99/0xd0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.213273] [<ffffffff81355eb0>] sysdev_store+0x20/0x30
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.218666] [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.224495] [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.229713] [<ffffffff8117f024>] sys_write+0x54/0xa0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.234846] [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.240957] bash D 0000000000000000 5784 23638 17550 0x00000084
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.248136] ffff88046068bd88 0000000000000046 ffff88046068bfd8 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.255780] ffff88046068a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.263423] ffff88046068bfd8 00000000001d3a00 ffff8801f1592180 ffff88046d59a4c0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.271072] Call Trace:
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.273601] [<ffffffff81533653>] __mutex_lock_common+0x193/0x3f0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.279778] [<ffffffff810315f7>] ? cpu_hotplug_driver_lock+0x17/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.286292] [<ffffffff810315f7>] ? cpu_hotplug_driver_lock+0x17/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.292812] [<ffffffff815339d7>] mutex_lock_nested+0x37/0x50
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.298634] [<ffffffff810315f7>] cpu_hotplug_driver_lock+0x17/0x20
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.304980] [<ffffffff8151e532>] store_online+0x32/0xd0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.310371] [<ffffffff81355eb0>] sysdev_store+0x20/0x30
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.315766] [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.321591] [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.326808] [<ffffffff8117f024>] sys_write+0x54/0xa0
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.331938] [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
Jun 9 11:29:40 istl-vmc-blade9 firmware.sh[23918]: Cannot find firmware file 'intel-ucode/06-2c-02'
Jun 9 11:29:40 istl-vmc-blade9 kernel: [ 6561.338039] Restarting tasks ...
I have attached the config file and the log with this mail.
--
Regards,
Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab
View attachment "messages" of type "text/plain" (24640 bytes)
View attachment "config" of type "text/plain" (116051 bytes)
Powered by blists - more mailing lists