[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200801182136.15213.ak@suse.de>
Date: Fri, 18 Jan 2008 21:36:15 +0100
From: Andi Kleen <ak@...e.de>
To: Mike Travis <travis@....com>
Cc: Ingo Oeser <ioe-lkml@...eria.de>,
Andrew Morton <akpm@...ux-foundation.org>, mingo@...e.hu,
Christoph Lameter <clameter@....com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/5] x86: Add config variables for SMP_MAX
First I think you have to get rid of the THREAD_ORDER stuff -- your
goal of the whole patchkit after all is to allow distributions to
support NR_CPUS==4096 in the standard kernels and I doubt any
distribution will over chose a THREAD_ORDER > 1 in their
standard kernels because it would be too unreliable on smaller
systems.
> Here are the top stack consumers with NR_CPUS = 4k.
>
> 16392 isolated_cpu_setup
> 10328 build_sched_domains
> 8248 numa_initmem_init
These should run single threaded early at boot so you can probably just make
the cpumask_t variables static __initdata
> 4664 cpu_attach_domain
> 4104 show_shared_cpu_map
These above are the real pigs. Fortunately they are all clearly
slowpath (except perhaps show_shared_cpu_map) so just using heap
allocations or when needed bootmem for them should be fine.
> 3656 centrino_target
> 3608 powernowk8_cpu_init
> 3192 sched_domain_node_span
x86-64 always has 8k stacks and separate interrupt stack. As long
as the calls are not in some stack intensive layered context (like block
IO processing path etc.) <3k shouldn't be too big an issue.
BTW there is a trick to get more stack space on x86-64 temporarily:
run it in a softirq. They got 16k stacks by default. Just leave
enough left over for the hard irqs that might happen if you don't
have interrupts disabled.
> 3144 acpi_cpufreq_target
> 2584 __svc_create_thread
> 2568 cpu_idle_wait
> 2136 netxen_nic_flash_print
> 2104 powernowk8_target
> 2088 _cpu_down
> 2072 cache_add_dev
> 2056 get_cur_freq
> 0 acpi_processor_ffh_cstate_probe
> 2056 microcode_write
> 0 acpi_processor_get_throttling
> 2048 check_supported_cpu
>
> And I've yet to figure out how to accumulate stack sizes using
> call threads.
One way if you don't care about indirect/asm calls is to use cflow and do
some post processing that adds up the data from checkstack.pl
The other way is to use mcount, but only for situations you can reproduce
of course. I did have a 2.4 mcount based stack instrumentation patch
some time ago that I could probably dig out if it was useful.
-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists