lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110119124433.GA14096@mtj.dyndns.org>
Date:	Wed, 19 Jan 2011 13:44:33 +0100
From:	Tejun Heo <tj@...nel.org>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Pekka Enberg <penberg@...helsinki.fi>
Subject: Re: percpu related boot crash on x86 (was: Linux 2.6.38-rc1)

Hello, Ingo.

On Wed, Jan 19, 2011 at 01:02:00PM +0100, Ingo Molnar wrote:
> 
> There's a rather frequent, percpu related boot crash that I can see with .38-rc1:
> [    0.000000] NR_IRQS:4352
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: at kernel/smp.c:433 smp_call_function_many+0x90/0x209()
...
> [    0.000000]  [<ffffffff81076299>] ? on_each_cpu+0x1b/0x39
> [    0.000000]  [<ffffffff810274e6>] ? flush_tlb_all+0x1c/0x1e
> [    0.000000]  [<ffffffff810dc7d7>] ? remove_vm_area+0x71/0x96
> [    0.000000]  [<ffffffff810dc868>] ? __vunmap+0x3f/0xcf
> [    0.000000]  [<ffffffff810dc9db>] ? vfree+0x2c/0x2e
> [    0.000000]  [<ffffffff810ccca6>] ? pcpu_mem_free+0x1e/0x20
> [    0.000000]  [<ffffffff810ccd75>] ? pcpu_extend_area_map+0x9a/0xb6
> [    0.000000]  [<ffffffff810cd452>] ? pcpu_alloc+0x17e/0x916
> [    0.000000]  [<ffffffff8106bb00>] ? trace_hardirqs_off+0xd/0xf
> [    0.000000]  [<ffffffff810e5bed>] ? kmem_cache_alloc_trace+0xab/0x120
> [    0.000000]  [<ffffffff810cdbfa>] ? __alloc_percpu+0x10/0x12
> [    0.000000]  [<ffffffff8180afd4>] ? early_irq_init+0xb2/0x13d
...

This is vfree() path used before local irq is enabled during early
boot.  vfree() triggered TLB flush (maybe debug enabled?) which used
on_each_cpu() which isn't quite happy to be called with local irq
diabled.

> [    0.000000] general protection fault: 01bb [#1] SMP DEBUG_PAGEALLOC
...
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff810068a4>] init_8259A+0xe3/0xe8
> [    0.000000]  [<ffffffff817f7d71>] init_ISA_irqs+0x2f/0x5a
> [    0.000000]  [<ffffffff817f7de1>] native_init_IRQ+0xe/0xa2
> [    0.000000]  [<ffffffff817f7dd1>] init_IRQ+0x35/0x37
> [    0.000000]  [<ffffffff817f4a0b>] start_kernel+0x1ff/0x3a4
> [    0.000000]  [<ffffffff817f42a6>] x86_64_start_reservations+0xb6/0xba
> [    0.000000]  [<ffffffff817f43a1>] x86_64_start_kernel+0xf7/0xfe
> [    0.000000] Code: 18 48 89 f3 be 01 00 00 00 e8 33 fe cd ff 4c 89 e7 e8 77 1f e2 ff f6 c7 02 75 09 53 9d e8 a0 bf cd ff eb 07 e8 74 08 ce ff 53 9d <5b> 41 5c c9 c3 55 48 89 e5 53 48 83 ec 08 e8 91 2c c7 ff 48 8b 
> [    0.000000] RIP  [<ffffffff8138fb5c>] _raw_spin_unlock_irqrestore+0x41/0x4

and this looks like alloc_percpu() failed earlier during early irq
init.  The irq init functions don't check for NULL return so it just
goes off later.  I'll see if I can reproduce the problem here.

It doesn't look like anything hardware dependent.  The first warning
seems more or less spurious and the GPF seems to be caused by earlier
memory allocation failure.  It's a bit curious that the allocation
failed on a x86_64 machine tho.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ