[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48B32458.5020104@hp.com>
Date: Mon, 25 Aug 2008 17:30:00 -0400
From: "Alan D. Brunelle" <Alan.Brunelle@...com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: "Rafael J. Wysocki" <rjw@...k.pl>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Kernel Testers List <kernel-testers@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Arjan van de Ven <arjan@...ux.intel.com>,
Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected
Linus Torvalds wrote:
>
> On Mon, 25 Aug 2008, Linus Torvalds wrote:
>> But I'll look at your vmlinux, see what stands out.
>
> Oops. I already see the problem.
>
> Your .config has soem _huge_ CPU count, doesn't it?
>
> checkstack.pl shows these things as the top problems:
>
> 0xffffffff80266234 smp_call_function_mask [vmlinux]: 2736
> 0xffffffff80234747 __build_sched_domains [vmlinux]: 2232
> 0xffffffff8023523f __build_sched_domains [vmlinux]: 2232
> 0xffffffff8021e884 setup_IO_APIC_irq [vmlinux]: 1616
> 0xffffffff8021ee24 arch_setup_ht_irq [vmlinux]: 1600
> 0xffffffff8021f144 arch_setup_msi_irq [vmlinux]: 1600
> 0xffffffff8021e3b0 __assign_irq_vector [vmlinux]: 1592
> 0xffffffff8021e626 __assign_irq_vector [vmlinux]: 1592
> 0xffffffff8023257e move_task_off_dead_cpu [vmlinux]: 1592
> 0xffffffff802326e8 move_task_off_dead_cpu [vmlinux]: 1592
> 0xffffffff8025dbc5 tick_handle_oneshot_broadcast [vmlinux]:1544
> 0xffffffff8025dcb4 tick_handle_oneshot_broadcast [vmlinux]:1544
> 0xffffffff803f3dc4 store_scaling_governor [vmlinux]: 1376
> 0xffffffff80279ef4 cpuset_write_resmask [vmlinux]: 1360
> 0xffffffff803f465d cpufreq_add_dev [vmlinux]: 1352
> 0xffffffff803f495b cpufreq_add_dev [vmlinux]: 1352
> 0xffffffff803f3fc4 store_scaling_max_freq [vmlinux]: 1328
> 0xffffffff803f4064 store_scaling_min_freq [vmlinux]: 1328
> 0xffffffff803f44c4 cpufreq_update_policy [vmlinux]: 1328
> ..
>
> and sys_init_module is actually way way down the list. I bet the only
> reason it showed up at all was because dynamically it was such a deep
> callchain, and part of that callchain probably called some of those really
> nasty things.
>
> Anyway, the reason smp_call_function_mask and friends have such _huge_
> stack usages for you is that they contain a 'cpumask_t' on the stack.
>
> For example, for me, usign a sane NR_CPU, the size of the stack frame for
> smp_call_function_mask is under 200 bytes. For you, it's 2736 bytes.
>
> How about you make CONFIG_NR_CPU's something _sane_? Like 16? Or do you
> really have four thousand CPU's in that system?
>
> Oh, I guess you have the MAXSMP config enabled? I really think that was a
> bit too aggressive.
>
> Linus
This probably all started when I was working on a software tool (aiod)
that was failing because somebody ELSE had 4,096 CPUs configured.
[[Seems that gcc had/has? it's MAX CPU value set to 1,024 (bits/sched.h
__CPU_SETSIZE), so when you issue system calls like sched_getaffinity,
it will "fail" for systems configured w/ 4,096 CPUs. I worked around it
by simply forgetting about the gcc values, and kept allocating larger
CPU masks until it worked.]]
I think you're right: the kernel as a whole may not be ready for 4,096
CPUs apparently...
Thanks for taking the time to look into this...
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists