linux-kernel - percpu related boot crash on x86 (was: Linux 2.6.38-rc1)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110119120200.GA1057@elte.hu>
Date:	Wed, 19 Jan 2011 13:02:00 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Tejun Heo <tj@...nel.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Pekka Enberg <penberg@...helsinki.fi>
Subject: percpu related boot crash on x86 (was: Linux 2.6.38-rc1)


There's a rather frequent, percpu related boot crash that I can see with .38-rc1:

[    0.000000] NR_IRQS:4352
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: at kernel/smp.c:433 smp_call_function_many+0x90/0x209()
[    0.000000] Hardware name: System Product Name
[    0.000000] Modules linked in:
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.38-rc1 #86551
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff8103f544>] ? warn_slowpath_common+0x85/0x9d
[    0.000000]  [<ffffffff81027218>] ? do_flush_tlb_all+0x0/0x4d
[    0.000000]  [<ffffffff81027218>] ? do_flush_tlb_all+0x0/0x4d
[    0.000000]  [<ffffffff8103f576>] ? warn_slowpath_null+0x1a/0x1c
[    0.000000]  [<ffffffff810760df>] ? smp_call_function_many+0x90/0x209
[    0.000000]  [<ffffffff810cc7ca>] ? pcpu_mem_alloc+0x65/0x67
[    0.000000]  [<ffffffff81027218>] ? do_flush_tlb_all+0x0/0x4d
[    0.000000]  [<ffffffff8107627a>] ? smp_call_function+0x22/0x26
[    0.000000]  [<ffffffff81076299>] ? on_each_cpu+0x1b/0x39
[    0.000000]  [<ffffffff810274e6>] ? flush_tlb_all+0x1c/0x1e
[    0.000000]  [<ffffffff810dc7d7>] ? remove_vm_area+0x71/0x96
[    0.000000]  [<ffffffff810dc868>] ? __vunmap+0x3f/0xcf
[    0.000000]  [<ffffffff810dc9db>] ? vfree+0x2c/0x2e
[    0.000000]  [<ffffffff810ccca6>] ? pcpu_mem_free+0x1e/0x20
[    0.000000]  [<ffffffff810ccd75>] ? pcpu_extend_area_map+0x9a/0xb6
[    0.000000]  [<ffffffff810cd452>] ? pcpu_alloc+0x17e/0x916
[    0.000000]  [<ffffffff8106bb00>] ? trace_hardirqs_off+0xd/0xf
[    0.000000]  [<ffffffff810e5bed>] ? kmem_cache_alloc_trace+0xab/0x120
[    0.000000]  [<ffffffff810cdbfa>] ? __alloc_percpu+0x10/0x12
[    0.000000]  [<ffffffff8180afd4>] ? early_irq_init+0xb2/0x13d
[    0.000000]  [<ffffffff817f4a06>] ? start_kernel+0x1fa/0x3a4
[    0.000000]  [<ffffffff817f42a6>] ? x86_64_start_reservations+0xb6/0xba
[    0.000000]  [<ffffffff817f43a1>] ? x86_64_start_kernel+0xf7/0xfe
[    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.000000] ------------[ cut here ]------------

followed shortly later by a nasty #GPF crash:

[    0.000000] general protection fault: 01bb [#1] SMP DEBUG_PAGEALLOC
[    0.000000] last sysfs file: 
[    0.000000] CPU 0 
[    0.000000] Modules linked in:
[    0.000000] 
[    0.000000] Pid: 0, comm: swapper Tainted: G        W   2.6.38-rc1 #86551 A8N-E/System Product Name
[    0.000000] RIP: 0010:[<ffffffff8138fb5c>]  [<ffffffff8138fb5c>] _raw_spin_unlock_irqrestore+0x41/0x46
[    0.000000] RSP: 0000:ffffffff81601ec8  EFLAGS: 00010246
[    0.000000] RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000000
[    0.000000] RDX: 0000000000010000 RSI: 0000000000000001 RDI: ffffffff8138fb5a
[    0.000000] RBP: ffffffff81601ed8 R08: ffffffffffffffff R09: 0000000000000001
[    0.000000] R10: ffffffff81601e78 R11: 0000000000000000 R12: ffffffff81762690
[    0.000000] R13: ffffffff8153b5ce R14: ffffffffffffffff R15: 0000000000000000
[    0.000000] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.000000] CR2: 0000000000f06f53 CR3: 0000000001757000 CR4: 00000000000006b0
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.000000] Process swapper (pid: 0, threadinfo ffffffff81600000, task ffffffff8175f020)
[    0.000000] Stack:
[    0.000000]  0000000000000000 0000000000000246 ffffffff81601ef8 ffffffff810068a4
[    0.000000]  0000000000000000 ffffffff817626d0 ffffffff81601f28 ffffffff817f7d71
[    0.000000]  0000000000001100 ffffffff81828760 ffffffff8182a2c0 ffff88003ffdc380
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff810068a4>] init_8259A+0xe3/0xe8
[    0.000000]  [<ffffffff817f7d71>] init_ISA_irqs+0x2f/0x5a
[    0.000000]  [<ffffffff817f7de1>] native_init_IRQ+0xe/0xa2
[    0.000000]  [<ffffffff817f7dd1>] init_IRQ+0x35/0x37
[    0.000000]  [<ffffffff817f4a0b>] start_kernel+0x1ff/0x3a4
[    0.000000]  [<ffffffff817f42a6>] x86_64_start_reservations+0xb6/0xba
[    0.000000]  [<ffffffff817f43a1>] x86_64_start_kernel+0xf7/0xfe
[    0.000000] Code: 18 48 89 f3 be 01 00 00 00 e8 33 fe cd ff 4c 89 e7 e8 77 1f e2 ff f6 c7 02 75 09 53 9d e8 a0 bf cd ff eb 07 e8 74 08 ce ff 53 9d <5b> 41 5c c9 c3 55 48 89 e5 53 48 83 ec 08 e8 91 2c c7 ff 48 8b 
[    0.000000] RIP  [<ffffffff8138fb5c>] _raw_spin_unlock_irqrestore+0x41/0x46

Full bootlog and config attached.

One thing I've noticed is that CONFIG_MAXSMP=y, so it's pushing our various data 
structure limits. The crash signature itself seems to implicate NR_IRQS related data 
structures.

I'm quite certain that this is not related to x86 or irq changes in .38-1 - those 
were tested on this box rather well and this crash never triggered.

Thanks,

	Ingo

View attachment "config" of type "text/plain" (79219 bytes)

View attachment "crash.log" of type "text/plain" (14244 bytes)