[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110119120200.GA1057@elte.hu>
Date: Wed, 19 Jan 2011 13:02:00 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Tejun Heo <tj@...nel.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Andrew Morton <akpm@...ux-foundation.org>,
Pekka Enberg <penberg@...helsinki.fi>
Subject: percpu related boot crash on x86 (was: Linux 2.6.38-rc1)
There's a rather frequent, percpu related boot crash that I can see with .38-rc1:
[ 0.000000] NR_IRQS:4352
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: at kernel/smp.c:433 smp_call_function_many+0x90/0x209()
[ 0.000000] Hardware name: System Product Name
[ 0.000000] Modules linked in:
[ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.38-rc1 #86551
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff8103f544>] ? warn_slowpath_common+0x85/0x9d
[ 0.000000] [<ffffffff81027218>] ? do_flush_tlb_all+0x0/0x4d
[ 0.000000] [<ffffffff81027218>] ? do_flush_tlb_all+0x0/0x4d
[ 0.000000] [<ffffffff8103f576>] ? warn_slowpath_null+0x1a/0x1c
[ 0.000000] [<ffffffff810760df>] ? smp_call_function_many+0x90/0x209
[ 0.000000] [<ffffffff810cc7ca>] ? pcpu_mem_alloc+0x65/0x67
[ 0.000000] [<ffffffff81027218>] ? do_flush_tlb_all+0x0/0x4d
[ 0.000000] [<ffffffff8107627a>] ? smp_call_function+0x22/0x26
[ 0.000000] [<ffffffff81076299>] ? on_each_cpu+0x1b/0x39
[ 0.000000] [<ffffffff810274e6>] ? flush_tlb_all+0x1c/0x1e
[ 0.000000] [<ffffffff810dc7d7>] ? remove_vm_area+0x71/0x96
[ 0.000000] [<ffffffff810dc868>] ? __vunmap+0x3f/0xcf
[ 0.000000] [<ffffffff810dc9db>] ? vfree+0x2c/0x2e
[ 0.000000] [<ffffffff810ccca6>] ? pcpu_mem_free+0x1e/0x20
[ 0.000000] [<ffffffff810ccd75>] ? pcpu_extend_area_map+0x9a/0xb6
[ 0.000000] [<ffffffff810cd452>] ? pcpu_alloc+0x17e/0x916
[ 0.000000] [<ffffffff8106bb00>] ? trace_hardirqs_off+0xd/0xf
[ 0.000000] [<ffffffff810e5bed>] ? kmem_cache_alloc_trace+0xab/0x120
[ 0.000000] [<ffffffff810cdbfa>] ? __alloc_percpu+0x10/0x12
[ 0.000000] [<ffffffff8180afd4>] ? early_irq_init+0xb2/0x13d
[ 0.000000] [<ffffffff817f4a06>] ? start_kernel+0x1fa/0x3a4
[ 0.000000] [<ffffffff817f42a6>] ? x86_64_start_reservations+0xb6/0xba
[ 0.000000] [<ffffffff817f43a1>] ? x86_64_start_kernel+0xf7/0xfe
[ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 0.000000] ------------[ cut here ]------------
followed shortly later by a nasty #GPF crash:
[ 0.000000] general protection fault: 01bb [#1] SMP DEBUG_PAGEALLOC
[ 0.000000] last sysfs file:
[ 0.000000] CPU 0
[ 0.000000] Modules linked in:
[ 0.000000]
[ 0.000000] Pid: 0, comm: swapper Tainted: G W 2.6.38-rc1 #86551 A8N-E/System Product Name
[ 0.000000] RIP: 0010:[<ffffffff8138fb5c>] [<ffffffff8138fb5c>] _raw_spin_unlock_irqrestore+0x41/0x46
[ 0.000000] RSP: 0000:ffffffff81601ec8 EFLAGS: 00010246
[ 0.000000] RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000000
[ 0.000000] RDX: 0000000000010000 RSI: 0000000000000001 RDI: ffffffff8138fb5a
[ 0.000000] RBP: ffffffff81601ed8 R08: ffffffffffffffff R09: 0000000000000001
[ 0.000000] R10: ffffffff81601e78 R11: 0000000000000000 R12: ffffffff81762690
[ 0.000000] R13: ffffffff8153b5ce R14: ffffffffffffffff R15: 0000000000000000
[ 0.000000] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 0.000000] CR2: 0000000000f06f53 CR3: 0000000001757000 CR4: 00000000000006b0
[ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81600000, task ffffffff8175f020)
[ 0.000000] Stack:
[ 0.000000] 0000000000000000 0000000000000246 ffffffff81601ef8 ffffffff810068a4
[ 0.000000] 0000000000000000 ffffffff817626d0 ffffffff81601f28 ffffffff817f7d71
[ 0.000000] 0000000000001100 ffffffff81828760 ffffffff8182a2c0 ffff88003ffdc380
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff810068a4>] init_8259A+0xe3/0xe8
[ 0.000000] [<ffffffff817f7d71>] init_ISA_irqs+0x2f/0x5a
[ 0.000000] [<ffffffff817f7de1>] native_init_IRQ+0xe/0xa2
[ 0.000000] [<ffffffff817f7dd1>] init_IRQ+0x35/0x37
[ 0.000000] [<ffffffff817f4a0b>] start_kernel+0x1ff/0x3a4
[ 0.000000] [<ffffffff817f42a6>] x86_64_start_reservations+0xb6/0xba
[ 0.000000] [<ffffffff817f43a1>] x86_64_start_kernel+0xf7/0xfe
[ 0.000000] Code: 18 48 89 f3 be 01 00 00 00 e8 33 fe cd ff 4c 89 e7 e8 77 1f e2 ff f6 c7 02 75 09 53 9d e8 a0 bf cd ff eb 07 e8 74 08 ce ff 53 9d <5b> 41 5c c9 c3 55 48 89 e5 53 48 83 ec 08 e8 91 2c c7 ff 48 8b
[ 0.000000] RIP [<ffffffff8138fb5c>] _raw_spin_unlock_irqrestore+0x41/0x46
Full bootlog and config attached.
One thing I've noticed is that CONFIG_MAXSMP=y, so it's pushing our various data
structure limits. The crash signature itself seems to implicate NR_IRQS related data
structures.
I'm quite certain that this is not related to x86 or irq changes in .38-1 - those
were tested on this box rather well and this crash never triggered.
Thanks,
Ingo
View attachment "config" of type "text/plain" (79219 bytes)
View attachment "crash.log" of type "text/plain" (14244 bytes)
Powered by blists - more mailing lists