linux-kernel - Re: percpu related boot crash on x86 (was: Linux 2.6.38-rc1)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110119124433.GA14096@mtj.dyndns.org>
Date:	Wed, 19 Jan 2011 13:44:33 +0100
From:	Tejun Heo <tj@...nel.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Pekka Enberg <penberg@...helsinki.fi>
Subject: Re: percpu related boot crash on x86 (was: Linux 2.6.38-rc1)

Hello, Ingo.

On Wed, Jan 19, 2011 at 01:02:00PM +0100, Ingo Molnar wrote:
> 
> There's a rather frequent, percpu related boot crash that I can see with .38-rc1:
> [    0.000000] NR_IRQS:4352
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: at kernel/smp.c:433 smp_call_function_many+0x90/0x209()
...
> [    0.000000]  [<ffffffff81076299>] ? on_each_cpu+0x1b/0x39
> [    0.000000]  [<ffffffff810274e6>] ? flush_tlb_all+0x1c/0x1e
> [    0.000000]  [<ffffffff810dc7d7>] ? remove_vm_area+0x71/0x96
> [    0.000000]  [<ffffffff810dc868>] ? __vunmap+0x3f/0xcf
> [    0.000000]  [<ffffffff810dc9db>] ? vfree+0x2c/0x2e
> [    0.000000]  [<ffffffff810ccca6>] ? pcpu_mem_free+0x1e/0x20
> [    0.000000]  [<ffffffff810ccd75>] ? pcpu_extend_area_map+0x9a/0xb6
> [    0.000000]  [<ffffffff810cd452>] ? pcpu_alloc+0x17e/0x916
> [    0.000000]  [<ffffffff8106bb00>] ? trace_hardirqs_off+0xd/0xf
> [    0.000000]  [<ffffffff810e5bed>] ? kmem_cache_alloc_trace+0xab/0x120
> [    0.000000]  [<ffffffff810cdbfa>] ? __alloc_percpu+0x10/0x12
> [    0.000000]  [<ffffffff8180afd4>] ? early_irq_init+0xb2/0x13d
...

This is vfree() path used before local irq is enabled during early
boot.  vfree() triggered TLB flush (maybe debug enabled?) which used
on_each_cpu() which isn't quite happy to be called with local irq
diabled.

> [    0.000000] general protection fault: 01bb [#1] SMP DEBUG_PAGEALLOC
...
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff810068a4>] init_8259A+0xe3/0xe8
> [    0.000000]  [<ffffffff817f7d71>] init_ISA_irqs+0x2f/0x5a
> [    0.000000]  [<ffffffff817f7de1>] native_init_IRQ+0xe/0xa2
> [    0.000000]  [<ffffffff817f7dd1>] init_IRQ+0x35/0x37
> [    0.000000]  [<ffffffff817f4a0b>] start_kernel+0x1ff/0x3a4
> [    0.000000]  [<ffffffff817f42a6>] x86_64_start_reservations+0xb6/0xba
> [    0.000000]  [<ffffffff817f43a1>] x86_64_start_kernel+0xf7/0xfe
> [    0.000000] Code: 18 48 89 f3 be 01 00 00 00 e8 33 fe cd ff 4c 89 e7 e8 77 1f e2 ff f6 c7 02 75 09 53 9d e8 a0 bf cd ff eb 07 e8 74 08 ce ff 53 9d <5b> 41 5c c9 c3 55 48 89 e5 53 48 83 ec 08 e8 91 2c c7 ff 48 8b 
> [    0.000000] RIP  [<ffffffff8138fb5c>] _raw_spin_unlock_irqrestore+0x41/0x4

and this looks like alloc_percpu() failed earlier during early irq
init.  The irq init functions don't check for NULL return so it just
goes off later.  I'll see if I can reproduce the problem here.

It doesn't look like anything hardware dependent.  The first warning
seems more or less spurious and the GPF seems to be caused by earlier
memory allocation failure.  It's a bit curious that the allocation
failed on a x86_64 machine tho.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/