[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1248171979-29166-1-git-send-email-tj@kernel.org>
Date: Tue, 21 Jul 2009 19:25:59 +0900
From: Tejun Heo <tj@...nel.org>
To: linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
mingo@...hat.com, benh@...nel.crashing.org, davem@...emloft.net,
dhowells@...hat.com, npiggin@...e.de, JBeulich@...ell.com,
cl@...ux-foundation.org, rusty@...tcorp.com.au, hpa@...or.com,
tglx@...utronix.de, akpm@...ux-foundation.org, x86@...nel.org,
andi@...stfloor.org
Subject: [PATCHSET percpu#for-next] implement and use sparse embedding first chunk allocator
Hello, all.
This patchset teaches percpu allocator how to manage very sparse
units, vmalloc how to allocate congruent sparse vmap areas and combine
them to extend the embedding allocator to allow embedding of sparse
unit addresses. This basically implements Christoph's sparse
congruent allocator.
This allows NUMA configurations to use bootmem allocated memory
directly as non-NUMA machines do with the embedding allocator.
Setting up the first chunk is basically consisted of allocating memory
for each cpu and then build percpu configuration to so that the first
chunk is composed of those memory areas, which means that there can be
huge holes between units and chunks may overlap each other.
When further chunks are necessary pcpu_get_vm_areas() is called with
parameters to specify how many areas are necessary, how large each
should be and how apart they're from each other. The function scans
vmalloc address space top down looking for matching holes and returns
array of vmap areas. As the newly allocated areas are offset exactly
the same as the first chunk, the rest is pretty straight-forward.
This has the following benefits.
* No special remapping necessary. Arch codes don't need change its
address mapping or anything. It just needs to inform percpu
allocator how percpu areas ends up like. percpu allocator will
take any layout.
* No additional TLB pressure. Both page and large page remapping adds
TLB pressure. With embedding, there's no overhead. Whatever
translations being used for linear mapping is used as-is.
* Removes dup-mapping. Large page remapping ends up mapping the same
page twice. This causes subtle problem on x86 when page attribute
needs to be changed. The maps need to be looked up and split into
page mappings, which is a bit fragile. As embedding doesn't remap
anything, this problem doesn't exist.
The only restriction is that the vmalloc area needs to be huge - at
least orders of magnitude larger than the distances between NUMA
nodes. For 64bit machines, this isn't a problem but on 32bit NUMA
machines address space is a scarce resource. For x86_32 NUMAs, the
page mapping allocator is used. The reason for choosing page over
large page is because page is far simpler and the advantage of large
page isn't very clear.
0001-percpu-fix-pcpu_reclaim-locking.patch
0002-percpu-improve-boot-messages.patch
0003-percpu-rename-4k-first-chunk-allocator-to-page.patch
0004-percpu-build-first-chunk-allocators-selectively.patch
0005-percpu-generalize-first-chunk-allocator-selection.patch
0006-percpu-drop-static_size-from-first-chunk-allocator.patch
0007-percpu-make-dyn_size-mandatory-for-pcpu_setup_firs.patch
0008-percpu-add-align-to-pcpu_fc_alloc_fn_t.patch
0009-percpu-move-pcpu_lpage_build_unit_map-and-pcpul_l.patch
0010-percpu-introduce-pcpu_alloc_info-and-pcpu_group_inf.patch
0011-percpu-add-pcpu_unit_offsets.patch
0012-percpu-add-chunk-base_addr.patch
0013-vmalloc-separate-out-insert_vmalloc_vm.patch
0014-vmalloc-implement-pcpu_get_vm_areas.patch
0015-percpu-use-group-information-to-allocate-vmap-areas.patch
0016-percpu-update-embedding-first-chunk-allocator-to-ha.patch
0017-x86-percpu-use-embedding-for-64bit-NUMA-and-page-fo.patch
0018-percpu-kill-lpage-first-chunk-allocator.patch
0019-sparc64-use-embedding-percpu-first-chunk-allocator.patch
0020-powerpc64-convert-to-dynamic-percpu-allocator.patch
0001 fixes locking bug on reclaim path which was introduced by
2f39e637ea240efb74cf807d31c93a71a0b89174.
0002-0007 are misc changes. 4k allocator is renamed to page.
Messages are made prettier and more informative. Avoid building
unused first chunk allocators and so on. Nothing really drastic but
small cleanups to ease further changes.
0008-0009 prepares for later changes. @align is added to
pcpu_fc_alloc and functions are relocated.
0010 changes how first chunk configuration is passed to
pcpu_setup_first_chunk(). All information is collected into
pcpu_alloc_info struct including the unit grouping information which
used to be lost in the process. This change allows percpu allocator
to have enough information to allocate congruent vmap areas.
0011-0012 prepares percpu for sparse groups and units in them. offset
information is added and used to calculate addresses.
0013-0014 implement pcpu_get_vm_areas() which allocate congruent vmap
areas.
0015-0016 teaches percpu how to use multiple vm areas to allow sparse
groups and extends embedding allocator so that it knows how to embed
sparse areas.
0017 converts x86_64 NUMA to use embedding and x86_32 NUMA page.
0018 kills now unused lpage allocator and the related page attribute
code.
0019 converts sparc64 to use embedding allocator.
0020 converts powerpc64 to dynamic percpu allocator using embedding
allocator.
After this series, only ia64 is left with the static allocator. I
have the patch but don't have machine to verify it on. Will post as
RFC patch.
This patchset is on top of
linus#master (aea1f7964ae6cba5eb419a958956deb9016b3341)
+ [1] perpcu-fix-sparse-possible-cpu-map-handling patchset
+ pulled into percpu#for-next (457f82bac659745f6d5052e4c493d92d62722c9c)
and available in the following git tree. Please note that the
following tree is temporary and will be rebased.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git review
Diffstat follows. Only 112 lines added. :-)
Documentation/kernel-parameters.txt | 11
arch/powerpc/Kconfig | 4
arch/powerpc/kernel/setup_64.c | 61 +
arch/sparc/Kconfig | 3
arch/sparc/kernel/smp_64.c | 124 ---
arch/x86/Kconfig | 6
arch/x86/kernel/setup_percpu.c | 201 +-----
arch/x86/mm/pageattr.c | 20
include/linux/percpu.h | 105 +--
include/linux/vmalloc.h | 6
mm/percpu.c | 1139 +++++++++++++++++-------------------
mm/vmalloc.c | 338 ++++++++++
12 files changed, 1065 insertions(+), 953 deletions(-)
Thanks.
--
tejun
[1] http://thread.gmane.org/gmane.linux.kernel/867587
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists