lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 05 Nov 2009 12:47:48 -0500
From:	Valdis.Kletnieks@...edu
To:	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Darren Hart <dvhltc@...ibm.com>
Cc:	linux-kernel@...r.kernel.org
Subject: 2.6.32-rc5-mmotm1101 - unkillable processes stuck in futex.

(Hmm.. I seem to be on a roll on this -mmotm, breaking all sorts of stuff.. :)

Am cc'ing Thomas and Darren because their names were attached to commits in
the origin.patch that touched futex.c

It looks like pulseaudio clients with multiple threads manage to hose up
the futex code to the point they're not kill -9'able.  Semi-replicatable,
as I've hit it twice by accident. No recipe for triggering it yet.

Did it once to gyachi (a Yahoo Messenger client) and  twice to pidgin (an
everything-else IM client). 'top' would report 100%CPU usage, all of it kernel
mode, and it was confirmed by the CPU going to top Ghz and warming up some 6-7
degrees (so we were spinning on something rather than a wait/deadlock). In both
cases, I tried to kill -9 the process, the process didn't go away.

Here's the 'alt-sysrq-t' for both cases.  I started a second pidgin the second
time around, that one wedged real fast (on the first thread it created) and
didn't get kill -9'ed (if that makes a diff in the stack trace...)

gyachi wedged up - main thread kept going, subthread hung.

[44347.339018] gyachi        ? ffff88000260e010  3856  3183   2393 0x00000080
[44347.339018]  ffff88006c3cfeb8 0000000000000046 ffff88006c3cfe80 ffff88006c3cfe7c
[44347.339018]  ffff88006c3cfe28 0000000000000000 0000000000000155 ffff88006c0dabc0
[44347.339018]  ffff88006c3ce000 000000000000e010 ffff88006c0dabc0 00000001029f3766
[44347.339018] Call Trace:
[44347.339018]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[44347.339018]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[44347.339018]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[44347.339018]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[44347.339018]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[44347.339018] gyachi        R  running task     5344  3187   2393 0x00000084
[44347.339018]  ffff88006c2c6b40 0000000000000002 ffff88007967f988 ffffffff81066193
[44347.339018]  ffff88007967f998 ffffffff81066193 ffffffff823ceab0 0000000000000000
[44347.339018]  000000007967fab8 ffffffff814bd184 0000000000000000 ffff88007f8b0000
[44347.339018] Call Trace:
[44347.339018]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
[44347.339018]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
[44347.339018]  [<ffffffff814bd184>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[44347.339018]  [<ffffffff814be2c0>] ? restore_args+0x0/0x30
[44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[44347.339018]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
[44347.339018]  [<ffffffff81030429>] ? get_parent_ip+0x11/0x41
[44347.339018]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[44347.339018]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[44347.339018]  [<ffffffff810692d2>] ? queue_unlock+0x1d/0x21
[44347.339018]  [<ffffffff8106939f>] ? futex_wait_setup+0xc9/0xeb
[44347.339018]  [<ffffffff8106ae9d>] ? futex_wait_requeue_pi+0x190/0x3d4
[44347.339018]  [<ffffffff814bdae1>] ? _spin_unlock_irq+0x62/0x6f
[44347.339018]  [<ffffffff814bda7a>] ? _spin_unlock_irqrestore+0x7b/0x80
[44347.339018]  [<ffffffff8102843e>] ? need_resched+0x3a/0x40
[44347.339018]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[44347.339018]  [<ffffffff814bd9fa>] ? _spin_unlock+0x65/0x6a
[44347.339018]  [<ffffffff81069afc>] ? futex_wake+0x108/0x11a
[44347.339018]  [<ffffffff8106c11f>] ? do_futex+0x95d/0x9cb
[44347.339018]  [<ffffffff8106c2d9>] ? sys_futex+0x14c/0x164
[44347.339018]  [<ffffffff810e2328>] ? path_put+0x1d/0x22
[44347.339018]  [<ffffffff8100246b>] ? system_call_fastpath+0x16/0x1b

After the reboot, it bit again, pidgin this time.  Since the main thread
is the one that wedged, it locked up hard.

[ 1730.490005] pidgin        R  running task     4112  4195   2312 0x00000084
[ 1730.490005]  ffff880068889a08 ffffffff81066193 ffff880068889b54 0000000000000000
[ 1730.490005]  ffff880068889ae8 ffff880068aa8c80 0000000000000002 0000000000000000
[ 1730.490005]  ffffffff81069189 0000000000000000 ffff880068889ab8 0000000000000246
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff81066193>] ? trace_hardirqs_on_caller+0x16/0x13c
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff814be2c0>] ? restore_args+0x0/0x30
[ 1730.490005]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
[ 1730.490005]  [<ffffffff814bd506>] ? _spin_lock+0x36/0x45
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff814bd9bb>] ? _spin_unlock+0x26/0x6a
[ 1730.490005]  [<ffffffff810691bf>] ? get_futex_value_locked+0x2b/0x49
[ 1730.490005]  [<ffffffff810692c9>] ? queue_unlock+0x14/0x21
[ 1730.490005]  [<ffffffff8106939f>] ? futex_wait_setup+0xc9/0xeb
[ 1730.490005]  [<ffffffff81097e34>] ? ftrace_likely_update+0xc/0x14
[ 1730.490005]  [<ffffffff8106ae9d>] ? futex_wait_requeue_pi+0x190/0x3d4
[ 1730.490005]  [<ffffffff814bdae1>] ? _spin_unlock_irq+0x62/0x6f
[ 1730.490005]  [<ffffffff814bda7a>] ? _spin_unlock_irqrestore+0x7b/0x80
[ 1730.490005]  [<ffffffff8102843e>] ? need_resched+0x3a/0x40
[ 1730.490005]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[ 1730.490005]  [<ffffffff814bd9fa>] ? _spin_unlock+0x65/0x6a
[ 1730.490005]  [<ffffffff81069afc>] ? futex_wake+0x108/0x11a
[ 1730.490005]  [<ffffffff8106c11f>] ? do_futex+0x95d/0x9cb
[ 1730.490005]  [<ffffffff8106c2d9>] ? sys_futex+0x14c/0x164
[ 1730.490005]  [<ffffffff810e2328>] ? path_put+0x1d/0x22
[ 1730.490005]  [<ffffffff8100246b>] ? system_call_fastpath+0x16/0x1b

(This is me starting another one because the first one wedged. It wedged too, but
I don't remember kill -9'ing this one...)

[ 1730.490005] pidgin        R  running task     5672  4220   2312 0x00000084
[ 1730.490005]  ffff880057ce7a18 0000000000000046 ffff8800026133c0 ffff88005410a380
[ 1730.490005]  ffff880057ce7978 ffff8800026133c0 ffff880057ce7998 ffff880057f3c4c0
[ 1730.490005]  ffff880057ce6000 000000000000e010 ffff880057f3c4c8 ffffffff81030d7e
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff81030d7e>] ? finish_task_switch+0x95/0xb8
[ 1730.490005]  [<ffffffff814bd184>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1730.490005]  [<ffffffff814bb7bd>] preempt_schedule_irq+0x56/0x73
[ 1730.490005]  [<ffffffff81069189>] ? queue_lock+0x50/0x5b
[ 1730.490005]  [<ffffffff814be3d6>] retint_kernel+0x26/0x30
[ 1730.490005]  [<ffffffff81069128>] ? get_futex_key+0x24e/0x25f
[ 1730.490005]  [<ffffffff81068fa7>] ? get_futex_key+0xcd/0x25f
[ 1730.490005]  [<ffffffff814bd9f1>] ? _spin_unlock+0x5c/0x6a
[ 1730.490005]  [<ffffffff81069319>] futex_wait_setup+0x43/0xeb
[ 1730.490005]  [<ffffffff81068fa7>] ? get_futex_key+0xcd/0x25f
[ 1730.490005]  [<ffffffff8106ae9d>] futex_wait_requeue_pi+0x190/0x3d4
[ 1730.490005]  [<ffffffff814bdae1>] ? _spin_unlock_irq+0x62/0x6f
[ 1730.490005]  [<ffffffff814bda7a>] ? _spin_unlock_irqrestore+0x7b/0x80
[ 1730.490005]  [<ffffffff8102843e>] ? need_resched+0x3a/0x40
[ 1730.490005]  [<ffffffff814bb838>] ? preempt_schedule+0x5e/0x67
[ 1730.490005]  [<ffffffff814bd9fa>] ? _spin_unlock+0x65/0x6a
[ 1730.490005]  [<ffffffff81069afc>] ? futex_wake+0x108/0x11a
[ 1730.490005]  [<ffffffff8106c11f>] do_futex+0x95d/0x9cb
[ 1730.490005]  [<ffffffff811c2bd6>] ? __up_read+0x1a/0x7f
[ 1730.490005]  [<ffffffff811c2bd6>] ? __up_read+0x1a/0x7f
[ 1730.490005]  [<ffffffff811caa4c>] ? _raw_spin_lock+0xe9/0x1ab
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff814bda71>] ? _spin_unlock_irqrestore+0x72/0x80
[ 1730.490005]  [<ffffffff811c2c32>] ? __up_read+0x76/0x7f
[ 1730.490005]  [<ffffffff8106c2d9>] sys_futex+0x14c/0x164
[ 1730.490005]  [<ffffffff810e2328>] ? path_put+0x1d/0x22
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b

And the rest of the thread from the first one I started. They're all
packed up and ready to leave Dodge on the first stagecoach, but the one
thread is still stuck in the saloon and unable to find its way out...

[ 1730.490005] pidgin        ? ffff88007f872040  5568  4214   4195 0x00000080
[ 1730.490005]  ffff880054033eb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001076 ffff880057c4b280
[ 1730.490005]  ffff880054032000 000000000000e010 ffff880057c4b280 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[ 1730.490005] pidgin        ? ffff88007f872040  5568  4215   4195 0x00000080
[ 1730.490005]  ffff880054059eb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001077 ffff8800689d0f00
[ 1730.490005]  ffff880054058000 000000000000e010 ffff8800689d0f00 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[ 1730.490005] pidgin        ? ffff88007f872040  5568  4216   4195 0x00000080
[ 1730.490005]  ffff8800542e7eb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001078 ffff88006887ad40
[ 1730.490005]  ffff8800542e6000 000000000000e010 ffff88006887ad40 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[ 1730.490005] pidgin        ? ffff88007f872040  5440  4217   4195 0x00000080
[ 1730.490005]  ffff880053c7deb8 0000000000000046 ffffffff8103ec0b ffff88007e968400
[ 1730.490005]  0000000000000011 0000000000040001 000003c700001079 ffff880053c7a440
[ 1730.490005]  ffff880053c7c000 000000000000e010 ffff880053c7a440 0000000000000000
[ 1730.490005] Call Trace:
[ 1730.490005]  [<ffffffff8103ec0b>] ? do_exit+0x779/0x906
[ 1730.490005]  [<ffffffff814c0df1>] ? sub_preempt_count+0x35/0x48
[ 1730.490005]  [<ffffffff8103ed89>] do_exit+0x8f7/0x906
[ 1730.490005]  [<ffffffff814bd1fa>] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1730.490005]  [<ffffffff8103ee27>] do_group_exit+0x8f/0xb8
[ 1730.490005]  [<ffffffff8103ee62>] sys_exit_group+0x12/0x16
[ 1730.490005]  [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b





Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ