linux-kernel - Re: percpu related boot crash on x86 (was: Linux 2.6.38-rc1)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AANLkTi=gxMkzO63RAWiuFt9vp57dvi4d=TaT=NmuvyY2@mail.gmail.com>
Date:	Wed, 19 Jan 2011 14:56:23 +0200
From:	Pekka Enberg <penberg@...nel.org>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Tejun Heo <tj@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Pekka Enberg <penberg@...helsinki.fi>
Subject: Re: percpu related boot crash on x86 (was: Linux 2.6.38-rc1)

On Wed, 2011-01-19 at 13:02 +0100, Ingo Molnar wrote:
>> There's a rather frequent, percpu related boot crash that I can see with .38-rc1:
>>
>> [    0.000000] NR_IRQS:4352
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: at kernel/smp.c:433 smp_call_function_many+0x90/0x209()
>> [    0.000000] Hardware name: System Product Name
>> [    0.000000] Modules linked in:
>> [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.38-rc1 #86551
>> [    0.000000] Call Trace:
>> [    0.000000]  [<ffffffff8103f544>] ? warn_slowpath_common+0x85/0x9d
>> [    0.000000]  [<ffffffff81027218>] ? do_flush_tlb_all+0x0/0x4d
>> [    0.000000]  [<ffffffff81027218>] ? do_flush_tlb_all+0x0/0x4d
>> [    0.000000]  [<ffffffff8103f576>] ? warn_slowpath_null+0x1a/0x1c
>> [    0.000000]  [<ffffffff810760df>] ? smp_call_function_many+0x90/0x209
>> [    0.000000]  [<ffffffff810cc7ca>] ? pcpu_mem_alloc+0x65/0x67
>> [    0.000000]  [<ffffffff81027218>] ? do_flush_tlb_all+0x0/0x4d
>> [    0.000000]  [<ffffffff8107627a>] ? smp_call_function+0x22/0x26
>> [    0.000000]  [<ffffffff81076299>] ? on_each_cpu+0x1b/0x39
>> [    0.000000]  [<ffffffff810274e6>] ? flush_tlb_all+0x1c/0x1e
>> [    0.000000]  [<ffffffff810dc7d7>] ? remove_vm_area+0x71/0x96
>> [    0.000000]  [<ffffffff810dc868>] ? __vunmap+0x3f/0xcf
>> [    0.000000]  [<ffffffff810dc9db>] ? vfree+0x2c/0x2e
>> [    0.000000]  [<ffffffff810ccca6>] ? pcpu_mem_free+0x1e/0x20
>> [    0.000000]  [<ffffffff810ccd75>] ? pcpu_extend_area_map+0x9a/0xb6
>> [    0.000000]  [<ffffffff810cd452>] ? pcpu_alloc+0x17e/0x916
>> [    0.000000]  [<ffffffff8106bb00>] ? trace_hardirqs_off+0xd/0xf
>> [    0.000000]  [<ffffffff810e5bed>] ? kmem_cache_alloc_trace+0xab/0x120
>> [    0.000000]  [<ffffffff810cdbfa>] ? __alloc_percpu+0x10/0x12
>> [    0.000000]  [<ffffffff8180afd4>] ? early_irq_init+0xb2/0x13d
>> [    0.000000]  [<ffffffff817f4a06>] ? start_kernel+0x1fa/0x3a4
>> [    0.000000]  [<ffffffff817f42a6>] ? x86_64_start_reservations+0xb6/0xba
>> [    0.000000]  [<ffffffff817f43a1>] ? x86_64_start_kernel+0xf7/0xfe
>> [    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
>> [    0.000000] ------------[ cut here ]------------
>
> You config had CONFIG_FRAME_POINTER=y, still its all '?', did out
> backtrace code go funny in the head?

On Wed, Jan 19, 2011 at 2:48 PM, Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
>  start_kernel()
>   local_irq_disable()
>   ...
>   early_irq_init()
>     alloc_desc()
>       alloc_percpu()
>         __alloc_percpu()
>           pcpu_alloc()
>             pcpu_extend_area_map()
>               pcpu_mem_free()
>                 vfree()
>                   __vunmap()
>                     remove_vm_area()
>                       free_unmap_vmap_area()
>                         vmap_debug_free_range()
> #ifdef CONFIG_DEBUG_PAGEALLOC
>                           flush_tlb_kernel_range()
>                             flush_tlb_all()
>                               on_each_cpu()
>                                 smp_call_function()
>                                   WARN_ON_ONCE(irqs_disabled()....);
>
>
> Not quite sure that to do about that though..

Is vmalloc() and vfree() supposed to work with interrupts disabled? I
always thought they weren't which would mean something in
pcpu_mem_alloc() needs changing...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/