lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51A918B7.3030302@candelatech.com>
Date:	Fri, 31 May 2013 14:40:07 -0700
From:	Ben Greear <greearb@...delatech.com>
To:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: 3.9.4+ watchdog overflow and migration lockup soon after boot
 (cpu_stopper_thread related?)

On 05/31/2013 12:22 PM, Ben Greear wrote:
> While trying to verify that the kobject patch
> (see "Please add to stable:  module: don't unlink the module until we've removed all exposure." email)
> fixed the problems I was seeing, I hit what I believe is a different problem.  Much harder to reproduce
> (maybe 15-20 reboots before I saw this..haven't seen it since).
>
> This was on a kernel with additional non-upstream patches, but it's not obviously related to anything
> I've been hacking on.  I'm trying to reproduce now on standard 3.9.4 plus the kobj patch,
> but this seems to be quite rare, so not sure how long it will take.

I reproduced this with only stock 3.9.4 and the kobj patch, so looks like it's
a real bug.  The info looks basically identical to what I posted previously.

I have logs if someone wants to see it.

It took around 20-30 reboots to hit the problem.

I'm thinking the cpu_stopper_thread is stuck in a loop..here are some of
the stack points decoded:

(gdb) l *(stop_machine_cpu_stop+0x68)
0xffffffff81103875 is in stop_machine_cpu_stop (/home/greearb/git/linux-2.6.linus/kernel/stop_machine.c:427).
422	
423		/* Simple state machine */
424		do {
425			/* Chill out and ensure we re-read stopmachine_state. */
426			cpu_relax();
427			if (smdata->state != curstate) {
428				curstate = smdata->state;
429				switch (curstate) {
430				case STOPMACHINE_DISABLE_IRQ:
431					local_irq_disable();
(gdb) l *(cpu_stopper_thread+0xbd)
0xffffffff81103574 is in cpu_stopper_thread (/home/greearb/git/linux-2.6.linus/kernel/stop_machine.c:286).
281	
282			/* cpu stop callbacks are not allowed to sleep */
283			preempt_disable();
284	
285			ret = fn(arg);
286			if (ret)
287				done->ret = ret;
288	
289			/* restore preemption and check it's still balanced */
290			preempt_enable();
(gdb)


SysRq : Show backtrace of all active CPUs
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU 3
Pid: 29, comm: migration/3 Tainted: G        WC   3.9.4+ #6 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: 0010:[<ffffffff81103875>]  [<ffffffff81103875>] stop_machine_cpu_stop+0x68/0xfb
RSP: 0018:ffff8802223f3cc8  EFLAGS: 00000093
RAX: 0000000000000002 RBX: ffff880219ac3e18 RCX: 0000000000000000
RDX: ffff8802223ec2c0 RSI: 0000000000000040 RDI: ffffffff81103898
RBP: ffff8802223f3d08 R08: 0000000000000000 R09: ffff8802223f3b98
R10: ffffffff81cfc860 R11: 0000000000000002 R12: 0000000000000292
R13: 0000000000000002 R14: ffff880219ac3e3c R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88022bd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003397a07cc0 CR3: 000000021b7ce000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process migration/3 (pid: 29, threadinfo ffff8802223f2000, task ffff8802223ec2c0)
Stack:
  ffff88022bd8e5b0 00ff88022bd8e610 ffff8802223f3ce8 ffff88022bd8e5b0
  ffff880219ac3d58 ffff8802223f2000 ffff8802223f2000 ffffffff8110380d
  ffff8802223f3de8 ffffffff81103574 ffff8802223f3fd8 ffff8802223f2010
Call Trace:
  [<ffffffff8110380d>] ? stop_one_cpu_nowait+0x30/0x30
  [<ffffffff81103574>] cpu_stopper_thread+0xbd/0x176
  [<ffffffff815e8db7>] ? __schedule+0x5e7/0x62f
  [<ffffffff815ea223>] ? _raw_spin_unlock_irqrestore+0x47/0x77
  [<ffffffff810e7e6a>] ? trace_hardirqs_on_caller+0x123/0x15a
  [<ffffffff810e7eae>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff815ea245>] ? _raw_spin_unlock_irqrestore+0x69/0x77
  [<ffffffff810bdcb2>] smpboot_thread_fn+0x217/0x21d
  [<ffffffff810bda9b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810b6a86>] kthread+0xc7/0xcf
  [<ffffffff810b69bf>] ? __init_kthread_worker+0x5b/0x5b
  [<ffffffff815f0cac>] ret_from_fork+0x7c/0xb0
  [<ffffffff810b69bf>] ? __init_kthread_worker+0x5b/0x5b
Code: 40 00 00 00 e8 f5 ad 20 00 41 39 c5 0f 94 45 cf eb 0e 44 0f a3 28 45 19 ed 45 85 ed 0f 95 45 cf 4c 8d 73 24 45 31 ff 31 c0 f3 90 <44> 8b 6b 20 41 39 c5 74 
44 41 83 fd 02 74 08 41 83 fd 03 75 1f
NMI backtrace for cpu 1
CPU 1
Pid: 17, comm: migration/1 Tainted: G        WC   3.9.4+ #6 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: 0010:[<ffffffff81103875>]  [<ffffffff81103875>] stop_machine_cpu_stop+0x68/0xfb
RSP: 0018:ffff88022230fcc8  EFLAGS: 00000093
RAX: 0000000000000002 RBX: ffff880219ac3e18 RCX: 0000000000000000
RDX: ffff8802223142c0 RSI: 0000000000000040 RDI: ffffffff81103898
RBP: ffff88022230fd08 R08: 0000000000000000 R09: ffff88022230fb98
R10: ffff880219ac3cf8 R11: ffff8802223ec2c0 R12: 0000000000000292
R13: 0000000000000002 R14: ffff880219ac3e3c R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88022bc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000440b70 CR3: 000000021a03b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process migration/1 (pid: 17, threadinfo ffff88022230e000, task ffff8802223142c0)
Stack:
  ffff88022bc8e5b0 00ff88022bc8e610 ffff88022230fce8 ffff88022bc8e5b0
  ffff880219ac3d58 ffff88022230e000 ffff88022230e000 ffffffff8110380d
  ffff88022230fde8 ffffffff81103574 ffff88022230ffd8 ffff88022230e010
Call Trace:
  [<ffffffff8110380d>] ? stop_one_cpu_nowait+0x30/0x30
  [<ffffffff81103574>] cpu_stopper_thread+0xbd/0x176
  [<ffffffff815e8db7>] ? __schedule+0x5e7/0x62f
  [<ffffffff815ea223>] ? _raw_spin_unlock_irqrestore+0x47/0x77
  [<ffffffff810e7e6a>] ? trace_hardirqs_on_caller+0x123/0x15a
  [<ffffffff810e7eae>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff815ea245>] ? _raw_spin_unlock_irqrestore+0x69/0x77
  [<ffffffff810bdcb2>] smpboot_thread_fn+0x217/0x21d
  [<ffffffff810bda9b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810b6a86>] kthread+0xc7/0xcf
  [<ffffffff810b69bf>] ? __init_kthread_worker+0x5b/0x5b
  [<ffffffff815f0cac>] ret_from_fork+0x7c/0xb0
  [<ffffffff810b69bf>] ? __init_kthread_worker+0x5b/0x5b
Code: 40 00 00 00 e8 f5 ad 20 00 41 39 c5 0f 94 45 cf eb 0e 44 0f a3 28 45 19 ed 45 85 ed 0f 95 45 cf 4c 8d 73 24 45 31 ff 31 c0 f3 90 <44> 8b 6b 20 41 39 c5 74 
44 41 83 fd 02 74 08 41 83 fd 03 75 1f
NMI backtrace for cpu 2
CPU 2
Pid: 23, comm: migration/2 Tainted: G        WC   3.9.4+ #6 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: 0010:[<ffffffff81016549>]  [<ffffffff81016549>] native_read_tsc+0x14/0x16
RSP: 0000:ffff88022bd03b40  EFLAGS: 00000006
RAX: 00000057e22024ac RBX: ffff8802223b4000 RCX: 00000000e22024ac
RDX: 0000000000000057 RSI: 0000000000000002 RDI: 0000000000289789
RBP: ffff88022bd03b40 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88022bd03a90 R11: 0000000000000006 R12: 000000000000006c
R13: 0000000000000002 R14: 0000000000000086 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff88022bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000574fe0 CR3: 0000000214ca8000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process migration/2 (pid: 23, threadinfo ffff8802223b4000, task ffff8802223bc2c0)
Stack:
  ffff88022bd03ba0 ffffffff81309ea2 ffff88022bd03b70 0000000000000046
  ffff88022bd03b70 00289789810e5676 ffff88022bd03ba0 0000000000000001
  000000000000006c 0000000000000007 0000000000000086 0000000000000001
Call Trace:
  <IRQ>
  [<ffffffff81309ea2>] delay_tsc+0x40/0xee
  [<ffffffff81309e35>] __const_udelay+0x28/0x2a
  [<ffffffff8103053d>] arch_trigger_all_cpu_backtrace+0x66/0x7d
  [<ffffffff813c467a>] sysrq_handle_showallcpus+0xe/0x10
  [<ffffffff813c49ab>] __handle_sysrq+0xbf/0x15b
  [<ffffffff813c4d68>] handle_sysrq+0x2c/0x2e
  [<ffffffff813d8fcc>] serial8250_rx_chars+0x145/0x1c5
  [<ffffffff813d90be>] serial8250_handle_irq+0x72/0xa8
  [<ffffffff813d917f>] serial8250_default_handle_irq+0x23/0x28
  [<ffffffff813d7e14>] serial8250_interrupt+0x4d/0xc6
  [<ffffffff8110e760>] handle_irq_event_percpu+0x7a/0x1ea
  [<ffffffff8110e911>] handle_irq_event+0x41/0x61
  [<ffffffff8111113c>] handle_edge_irq+0xa6/0xcb
  [<ffffffff81011dd7>] handle_irq+0x24/0x2d
  [<ffffffff815f2a4d>] do_IRQ+0x4d/0xb4
  [<ffffffff815ea872>] common_interrupt+0x72/0x72
  [<ffffffff815ea937>] ? retint_restore_args+0x13/0x13
  [<ffffffff811146d2>] ? rcu_sched_qs+0x22/0x22
  [<ffffffff8109e4f1>] ? __do_softirq+0x185/0x25a
  [<ffffffff8109e68f>] irq_exit+0x5f/0xbc
  [<ffffffff815f2a9d>] do_IRQ+0x9d/0xb4
  [<ffffffff815ea872>] common_interrupt+0x72/0x72
  <EOI>
  [<ffffffff815ea937>] ? retint_restore_args+0x13/0x13
  [<ffffffff81103875>] ? stop_machine_cpu_stop+0x68/0xfb
  [<ffffffff81103853>] ? stop_machine_cpu_stop+0x46/0xfb
  [<ffffffff8110380d>] ? stop_one_cpu_nowait+0x30/0x30
  [<ffffffff81103574>] cpu_stopper_thread+0xbd/0x176
  [<ffffffff815e8db7>] ? __schedule+0x5e7/0x62f
  [<ffffffff815ea223>] ? _raw_spin_unlock_irqrestore+0x47/0x77
  [<ffffffff810e7e6a>] ? trace_hardirqs_on_caller+0x123/0x15a
  [<ffffffff810e7eae>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff815ea245>] ? _raw_spin_unlock_irqrestore+0x69/0x77
  [<ffffffff810bdcb2>] smpboot_thread_fn+0x217/0x21d
  [<ffffffff810bda9b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810b6a86>] kthread+0xc7/0xcf
  [<ffffffff810b69bf>] ? __init_kthread_worker+0x5b/0x5b
  [<ffffffff815f0cac>] ret_from_fork+0x7c/0xb0
  [<ffffffff810b69bf>] ? __init_kthread_worker+0x5b/0x5b
Code: 31 c9 48 c1 ef 0a 48 01 f8 c3 55 48 89 e5 e8 33 0a 00 00 66 90 c9 c3 55 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 <c9> c3 55 8b 05 12 c3 ad 
00 48 89 e5 c9 c3 55 b8 ed ff ff ff 48
NMI backtrace for cpu 0
CPU 0
Pid: 8, comm: migration/0 Tainted: G        WC   3.9.4+ #6 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: 0010:[<ffffffff81103879>]  [<ffffffff81103879>] stop_machine_cpu_stop+0x6c/0xfb
RSP: 0018:ffff8802222abcc8  EFLAGS: 00000093
RAX: 0000000000000002 RBX: ffff880219ac3e18 RCX: 0000000000000000
RDX: ffff8802222a42c0 RSI: 0000000000000040 RDI: ffffffff81103898
RBP: ffff8802222abd08 R08: 0000000000000000 R09: ffff8802222abb98
R10: ffffffff81a01ec8 R11: 0000000000000000 R12: 0000000000000292
R13: 0000000000000002 R14: ffff880219ac3e3c R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88022bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff998bbff0 CR3: 0000000001a0c000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process migration/0 (pid: 8, threadinfo ffff8802222aa000, task ffff8802222a42c0)
Stack:
  ffff88022bc0e5b0 01ff88022bc0e610 ffff8802222abce8 ffff88022bc0e5b0
  ffff880219ac3d58 ffff8802222aa000 ffff8802222aa000 ffffffff8110380d
  ffff8802222abde8 ffffffff81103574 ffff8802222abfd8 ffff8802222aa010
Call Trace:
  [<ffffffff8110380d>] ? stop_one_cpu_nowait+0x30/0x30
  [<ffffffff81103574>] cpu_stopper_thread+0xbd/0x176
  [<ffffffff815e8db7>] ? __schedule+0x5e7/0x62f
  [<ffffffff815ea223>] ? _raw_spin_unlock_irqrestore+0x47/0x77
  [<ffffffff810e7e6a>] ? trace_hardirqs_on_caller+0x123/0x15a
  [<ffffffff810e7eae>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff815ea245>] ? _raw_spin_unlock_irqrestore+0x69/0x77
  [<ffffffff810bdcb2>] smpboot_thread_fn+0x217/0x21d
  [<ffffffff810bda9b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810b6a86>] kthread+0xc7/0xcf
  [<ffffffff810b69bf>] ? __init_kthread_worker+0x5b/0x5b
  [<ffffffff815f0cac>] ret_from_fork+0x7c/0xb0
  [<ffffffff810b69bf>] ? __init_kthread_worker+0x5b/0x5b
Code: e8 f5 ad 20 00 41 39 c5 0f 94 45 cf eb 0e 44 0f a3 28 45 19 ed 45 85 ed 0f 95 45 cf 4c 8d 73 24 45 31 ff 31 c0 f3 90 44 8b 6b 20 <41> 39 c5 74 44 41 83 fd 
02 74 08 41 83 fd 03 75 1f eb 0e fa 66

Thanks,
Ben

-- 
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ