lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E648DDA.8080605@linux.vnet.ibm.com>
Date:	Mon, 05 Sep 2011 14:22:42 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
CC:	Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
	Linux PM mailing list <linux-pm@...ts.linux-foundation.org>,
	oleg@...hat.com, arnd@...db.de
Subject: [BUG] CPU hotplug, freezer: Freezing of tasks failed after 20.00
 seconds

On 08/20/2011 02:30 AM, Rafael J. Wysocki wrote:
> On Friday, August 19, 2011, Tejun Heo wrote:
>> Hello,
>>
>> The freezer code has developed a number of convolutions and bugs.
>> It's now using five per-task flags - TIF_FREEZE, PF_FREEZING,
>> PF_NOFREEZE, PF_FROZEN, PF_FREEZER_SKIP and PF_FREEZER_NOSIG, and at
>> the same time has quite a few race conditions.  PF_NOFREEZE
>> modifications can race against PM freezer, cgroup_freezer can race
>> against PM freezer, and so on.
>>
>> This patchset tries to simplify the freezer implementation and fix the
>> various bugs.  It makes the synchronization more straight forward and
>> replaces TIF_FREEZE with directly checking freeze conditions which are
>> in effect, which makes the whole thing much saner.
>>
>> This patchset removes TIF_FREEZE and PF_FREEZING.  Also,
>> PF_FREEZER_SKIP users are planned to move away from the flag and will
>> be removed.  It contains the following 16 patches.
>>
>>  0001-freezer-fix-current-state-restoration-race-in-refrig.patch
>>  0002-freezer-don-t-unnecessarily-set-PF_NOFREEZE-explicit.patch
>>  0003-freezer-unexport-refrigerator-and-update-try_to_free.patch
>>  0004-freezer-implement-and-use-kthread_freezable_should_s.patch
>>  0005-freezer-rename-thaw_process-to-__thaw_task-and-simpl.patch
>>  0006-freezer-make-exiting-tasks-properly-unfreezable.patch
>>  0007-freezer-don-t-distinguish-nosig-tasks-on-thaw.patch
>>  0008-freezer-use-dedicated-lock-instead-of-task_lock-memo.patch
>>  0009-freezer-make-freezing-indicate-freeze-condition-in-e.patch
>>  0010-freezer-fix-set_freezable-_with_signal-race.patch
>>  0011-freezer-kill-PF_FREEZING.patch
>>  0012-freezer-clean-up-freeze_processes-failure-path.patch
>>  0013-cgroup_freezer-prepare-for-removal-of-TIF_FREEZE.patch
>>  0014-freezer-make-freezing-test-freeze-conditions-in-effe.patch
>>  0015-freezer-remove-now-unused-TIF_FREEZE.patch
>>  0016-freezer-remove-should_send_signal-and-update-frozen.patch
>>
>> This patchset is on top of the current linus#master (01b883358b "Merge
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc") and
>> available in the following git branch.
>>
>>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git freezer
> 

Hi,

I was testing out Tejun's above mentioned freezer patchset in different scenarios.
While running CPU hot-plug stress test and kernel compilation in the background
and simultaneously testing the suspend infrastructure using the pm_test framework
(at the freezer level), after a few minutes, it reported failure to freeze
tasks within 20 seconds.

This could be a CPU hotplug issue too, since a "possible circular locking dependency
detected" warning was encountered, some time before task freezing failure was hit.

Here is a an excerpt of the log:

Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.058681] Freezing of tasks failed after 20.01 seconds (2 tasks refusing to freeze, wq_busy=0):
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.067717] invert_cpu_stat D 0000000000000000  5304 20435  17329 0x00000084
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.074901]  ffff8801f367bab8 0000000000000046 ffff8801f367bfd8 00000000001d3a00
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.082551]  ffff8801f367a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.090199]  ffff8801f367bfd8 00000000001d3a00 ffff880414cc6840 ffff8801f36783c0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.097847] Call Trace:
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.100383]  [<ffffffff81532de5>] schedule_timeout+0x235/0x320
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.106308]  [<ffffffff810a8630>] ? __lock_acquired+0x280/0x2f0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.112315]  [<ffffffff8153292c>] ? wait_for_common+0x3c/0x170
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.118227]  [<ffffffff81532a03>] ? wait_for_common+0x113/0x170
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.124224]  [<ffffffff81532a0b>] wait_for_common+0x11b/0x170
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.130052]  [<ffffffff81064de0>] ? try_to_wake_up+0x300/0x300
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.135969]  [<ffffffff8107d64a>] ? mod_timer+0x15a/0x2c0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.141445]  [<ffffffff81532b3d>] wait_for_completion+0x1d/0x20
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.147445]  [<ffffffff81364486>] _request_firmware+0x156/0x2c0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.153447]  [<ffffffff81364686>] request_firmware+0x16/0x20
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.159190]  [<ffffffffa01f0da0>] request_microcode_fw+0x70/0xf0 [microcode]
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.166318]  [<ffffffffa01f0390>] microcode_init_cpu+0xc0/0x100 [microcode]
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.173354]  [<ffffffffa01f14b4>] mc_cpu_callback+0x7c/0x11f [microcode]
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.180137]  [<ffffffff815393a4>] notifier_call_chain+0x94/0xd0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.186135]  [<ffffffff8109770e>] __raw_notifier_call_chain+0xe/0x10
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.192572]  [<ffffffff8106d000>] __cpu_notify+0x20/0x40
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.197965]  [<ffffffff8152cf5b>] _cpu_up+0xc7/0x10e
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.203011]  [<ffffffff8152d07b>] cpu_up+0xd9/0xec
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.207884]  [<ffffffff8151e599>] store_online+0x99/0xd0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.213273]  [<ffffffff81355eb0>] sysdev_store+0x20/0x30
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.218666]  [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.224495]  [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.229713]  [<ffffffff8117f024>] sys_write+0x54/0xa0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.234846]  [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.240957] bash            D 0000000000000000  5784 23638  17550 0x00000084
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.248136]  ffff88046068bd88 0000000000000046 ffff88046068bfd8 00000000001d3a00
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.255780]  ffff88046068a010 00000000001d3a00 00000000001d3a00 00000000001d3a00
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.263423]  ffff88046068bfd8 00000000001d3a00 ffff8801f1592180 ffff88046d59a4c0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.271072] Call Trace:
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.273601]  [<ffffffff81533653>] __mutex_lock_common+0x193/0x3f0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.279778]  [<ffffffff810315f7>] ? cpu_hotplug_driver_lock+0x17/0x20
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.286292]  [<ffffffff810315f7>] ? cpu_hotplug_driver_lock+0x17/0x20
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.292812]  [<ffffffff815339d7>] mutex_lock_nested+0x37/0x50
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.298634]  [<ffffffff810315f7>] cpu_hotplug_driver_lock+0x17/0x20
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.304980]  [<ffffffff8151e532>] store_online+0x32/0xd0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.310371]  [<ffffffff81355eb0>] sysdev_store+0x20/0x30
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.315766]  [<ffffffff811f3096>] sysfs_write_file+0xe6/0x170
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.321591]  [<ffffffff8117ee50>] vfs_write+0xd0/0x1a0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.326808]  [<ffffffff8117f024>] sys_write+0x54/0xa0
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.331938]  [<ffffffff8153df02>] system_call_fastpath+0x16/0x1b
Jun  9 11:29:40 istl-vmc-blade9 firmware.sh[23918]: Cannot find  firmware file 'intel-ucode/06-2c-02'
Jun  9 11:29:40 istl-vmc-blade9 kernel: [ 6561.338039] Restarting tasks ... 


I have attached the config file and the log with this mail.

-- 
Regards,
Srivatsa S. Bhat  <srivatsa.bhat@...ux.vnet.ibm.com>
Linux Technology Center,
IBM India Systems and Technology Lab

View attachment "messages" of type "text/plain" (24640 bytes)

View attachment "config" of type "text/plain" (116051 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ