[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220215143728.3810954-1-Liam.Howlett@oracle.com>
Date: Tue, 15 Feb 2022 14:37:44 +0000
From: Liam Howlett <liam.howlett@...cle.com>
To: "maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: [PATCH v6 00/71] Introducing the Maple Tree
The maple tree is an RCU-safe range based B-tree designed to use modern
processor cache efficiently. There are a number of places in the kernel
that a non-overlapping range-based tree would be beneficial, especially
one with a simple interface. The first user that is covered in this
patch set is the vm_area_struct, where three data structures are
replaced by the maple tree: the augmented rbtree, the vma cache, and the
linked list of VMAs in the mm_struct. The long term goal is to reduce
or remove the mmap_sem contention.
The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
nodes. With the increased branching factor, it is significantly shorter than
the rbtree so it has fewer cache misses. The removal of the linked list
between subsequent entries also reduces the cache misses and the need to pull
in the previous and next VMA during many tree alterations.
This patch is based on v5.17-rc4
git: https://github.com/oracle/linux-uek/tree/howlett/maple/20220214
v6 changes:
- Added patch for xarray testcode which should be dropped for upstream
fix. - Thanks Matthew Wilcox
- Fixed issue with maple state index/last setting when the tree is just
a pointer - Thanks David Howells
- Changed internal RCU handling to not check flags more than once
- Fixed mas_prev() underflow issue - Thanks David Howells
- Fixed returns on mas_prev()/mas_next() when there is only a value at
0 index - Thanks David Howells
- Fixed mas_find_rev() running past minimum value - Thanks David Howells
- Fixed testing code function rename - Thanks Mark Hemment
- Reworked brk() and vm_brk_flags() as suggested - Thanks Vlastimil
Babka
- Documentation fixes - Thanks Mike Rapoport
- Separated test code from tree code - Thanks Vlastimil Babka
- Moved maple_tree_init() call into the maple tree patch for other users
- Thanks David Howells
- Fixed copyright date - Thanks Mike Rapoport
- Whitespace fixes in comments, reduced changes in other locations.
- Added missing kdocs - Thanks Mike Rapoport
- Fixed exit_mmap comment - Thanks Mark Hemment
- Removed RCU tracking as there is an issue with atomic increments of
mmget() - Thanks Mark Hemment for initial issue report that allowed me
to discover this. RCU was not being used, so it is disabled for VMA
tracking for now.
- Removed unnecessary assignment in mtree_range_walk() - Thanks JaeJoon
Jung
- Dropped ma_xa_benchmark from testing Makefile - Thanks JaeJoon Jung
- Fixed leaf gap calculation in rare case of underflow
- Fixed mmap_region() bug on merging of prev
- Fixed mmap_region() bug on khugepaged_enter_vma_merge()
- Added test cases for mas_find_rev(), mas_prev(), mas_next, and
mas_root_expand() to test suite.
- Updated config options and ifdefs to allow other maple tree users to
debug maple tree without debugging the VM maple tree.
v5: https://lore.kernel.org/linux-mm/20220203172051.i2jnhnkudzssdsxg@revolver/T/
v4: https://lore.kernel.org/linux-mm/20211201142918.921493-30-Liam.Howlett@oracle.com/t/
v3: https://lore.kernel.org/linux-mm/20211005012959.1110504-1-Liam.Howlett@oracle.com/
v2: https://lore.kernel.org/linux-mm/20210817154651.1570984-1-Liam.Howlett@oracle.com/
v1: https://lore.kernel.org/linux-mm/20210428153542.2814175-1-Liam.Howlett@Oracle.com/
Performance on a 144 core x86:
It is important to note that the code is still using the mmap_sem, the
performance seems fairly similar on real-world workloads, while there
are variations in micro-benchmarks.
kernbench, increased system time, less user time:
Amean user-2 885.34 ( 0.00%) 886.07 * -0.08%*
Amean syst-2 161.95 ( 0.00%) 168.19 * -3.85%*
Amean elsp-2 530.06 ( 0.00%) 532.96 * -0.55%*
Amean user-4 908.58 ( 0.00%) 908.88 * -0.03%*
Amean syst-4 167.56 ( 0.00%) 173.72 * -3.68%*
Amean elsp-4 277.21 ( 0.00%) 277.38 * -0.06%*
Amean user-8 961.84 ( 0.00%) 962.45 * -0.06%*
Amean syst-8 176.40 ( 0.00%) 183.43 * -3.99%*
Amean elsp-8 150.59 ( 0.00%) 151.21 * -0.41%*
Amean user-16 1040.15 ( 0.00%) 1039.89 * 0.02%*
Amean syst-16 188.19 ( 0.00%) 193.81 * -2.99%*
Amean elsp-16 86.85 ( 0.00%) 86.32 * 0.61%*
Amean user-32 1240.46 ( 0.00%) 1233.93 * 0.53%*
Amean syst-32 217.15 ( 0.00%) 222.99 * -2.69%*
Amean elsp-32 55.16 ( 0.00%) 55.09 * 0.12%*
Amean user-64 1241.17 ( 0.00%) 1234.26 * 0.56%*
Amean syst-64 215.11 ( 0.00%) 220.76 * -2.63%*
Amean elsp-64 32.88 ( 0.00%) 33.72 * -2.57%*
Amean user-128 1613.09 ( 0.00%) 1609.63 * 0.21%*
Amean syst-128 267.10 ( 0.00%) 276.72 * -3.60%*
Amean elsp-128 25.80 ( 0.00%) 26.09 * -1.10%*
Mixed Hmean results:
- freqmine-medium -3.50% to +12.82%
- malloc1-processes: -14.50% to +6.53%
- signal1-processes -1.87% to +14.02%
- page_fault3-threads -6.46% to +26.15%
- pthread_mutex1-threads -16.81% to +28.63%
Decrease in performance in the following micro-benchmarks in Hmean:
- brk1-processes -37.42% to -44.16%
- malloc1-threads -18.27% to -23.08%
Modifications are more expensive so the micro-benchmarks that write but
do not use the data will be negatively affected.
Patch organization:
Patch 1 is to add a missing lock to avoid an assert issue when using a vma iterator.
Patch 2 is an xarray fix due to bitmap header changes which will be
dropped for a pending upstream fix.
Patches 3 to 7 are radix tree test suite additions for maple tree
support.
Patch 8 adds the maple tree.
Patch 9 adds the maple tree test code.
Patches 10-19 are the removal of the rbtree from the mm_struct. This
now includes the introduction of the vma iterator.
Patch 20 optimizes __vma_adjust() for the maple tree.
Patches 21-27 are the removal of the vmacache from the kernel.
Patches 28-31 are internal mm changes for efficiencies.
Patches 32-69 are the removal of the linked list
Patches 70 and 71 are a small cleanup from the removal of the vma linked list.
Liam R. Howlett (61):
xarray: Fix bitmap breakage
radix tree test suite: Add pr_err define
radix tree test suite: Add kmem_cache_set_non_kernel()
radix tree test suite: Add allocation counts and size to kmem_cache
radix tree test suite: Add support for slab bulk APIs
radix tree test suite: Add lockdep_is_held to header
Maple Tree: Add new data structure
lib/test_maple_tree: Add testing for maple tree
mm: Start tracking VMAs with maple tree
mm/mmap: Use the maple tree in find_vma() instead of the rbtree.
mm/mmap: Use the maple tree for find_vma_prev() instead of the rbtree
mm/mmap: Use maple tree for unmapped_area{_topdown}
kernel/fork: Use maple tree for dup_mmap() during forking
mm: Remove rb tree.
mmap: Change zeroing of maple tree in __vma_adjust()
xen: Use vma_lookup() in privcmd_ioctl_mmap()
mm: Optimize find_exact_vma() to use vma_lookup()
mm/khugepaged: Optimize collapse_pte_mapped_thp() by using
vma_lookup()
mm/mmap: Change do_brk_flags() to expand existing VMA and add
do_brk_munmap()
mm: Use maple tree operations for find_vma_intersection()
mm/mmap: Use advanced maple tree API for mmap_region()
mm: Remove vmacache
mm/mmap: Move mmap_region() below do_munmap()
mm/mmap: Reorganize munmap to use maple states
mm/mmap: Change do_brk_munmap() to use do_mas_align_munmap()
arm64: Remove mmap linked list from vdso
parisc: Remove mmap linked list from cache handling
powerpc: Remove mmap linked list walks
s390: Remove vma linked list walks
x86: Remove vma linked list walks
xtensa: Remove vma linked list walks
cxl: Remove vma linked list walk
optee: Remove vma linked list walk
um: Remove vma linked list walk
binfmt_elf: Remove vma linked list walk
exec: Use VMA iterator instead of linked list
fs/proc/base: Use maple tree iterators in place of linked list
userfaultfd: Use maple tree iterator to iterate VMAs
ipc/shm: Use VMA iterator instead of linked list
acct: Use VMA iterator instead of linked list
perf: Use VMA iterator
sched: Use maple tree iterator to walk VMAs
fork: Use VMA iterator
bpf: Remove VMA linked list
mm/gup: Use maple tree navigation instead of linked list
mm/khugepaged: Stop using vma linked list
mm/ksm: Use vma iterators instead of vma linked list
mm/madvise: Use vma_find() instead of vma linked list
mm/memcontrol: Stop using mm->highest_vm_end
mm/mempolicy: Use vma iterator & maple state instead of vma linked
list
mm/mlock: Use vma iterator and instead of vma linked list
mm/mprotect: Use maple tree navigation instead of vma linked list
mm/mremap: Use vma_find_intersection() instead of vma linked list
mm/msync: Use vma_find() instead of vma linked list
mm/oom_kill: Use maple tree iterators instead of vma linked list
mm/pagewalk: Use vma_find() instead of vma linked list
mm/swapfile: Use vma iterator instead of vma linked list
riscv: Use vma iterator for vdso
mm: Remove the vma linked list
mm/mmap: Drop range_has_overlap() function
mm/mmap.c: Pass in mapping to __vma_link_file()
Matthew Wilcox (Oracle) (10):
binfmt_elf: Take the mmap lock when walking the VMA list
mm: Add VMA iterator
mmap: Use the VMA iterator in count_vma_pages_range()
damon: Convert __damon_va_three_regions to use the VMA iterator
proc: Remove VMA rbtree use from nommu
mm: Convert vma_lookup() to use mtree_load()
coredump: Remove vma linked list walk
fs/proc/task_mmu: Stop using linked list and highest_vm_end
i915: Use the VMA iterator
nommu: Remove uses of VMA linked list
Documentation/core-api/index.rst | 1 +
Documentation/core-api/maple_tree.rst | 218 +
MAINTAINERS | 12 +
arch/arm64/kernel/vdso.c | 3 +-
arch/parisc/kernel/cache.c | 9 +-
arch/powerpc/kernel/vdso.c | 6 +-
arch/powerpc/mm/book3s32/tlb.c | 11 +-
arch/powerpc/mm/book3s64/subpage_prot.c | 13 +-
arch/riscv/kernel/vdso.c | 3 +-
arch/s390/kernel/vdso.c | 3 +-
arch/s390/mm/gmap.c | 6 +-
arch/um/kernel/tlb.c | 14 +-
arch/x86/entry/vdso/vma.c | 9 +-
arch/x86/kernel/tboot.c | 2 +-
arch/xtensa/kernel/syscall.c | 18 +-
drivers/firmware/efi/efi.c | 2 +-
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 14 +-
drivers/misc/cxl/fault.c | 45 +-
drivers/tee/optee/call.c | 18 +-
drivers/xen/privcmd.c | 2 +-
fs/binfmt_elf.c | 6 +-
fs/coredump.c | 33 +-
fs/exec.c | 12 +-
fs/proc/base.c | 5 +-
fs/proc/internal.h | 2 +-
fs/proc/task_mmu.c | 74 +-
fs/proc/task_nommu.c | 45 +-
fs/userfaultfd.c | 49 +-
include/linux/maple_tree.h | 683 +
include/linux/mm.h | 77 +-
include/linux/mm_types.h | 43 +-
include/linux/mm_types_task.h | 12 -
include/linux/sched.h | 1 -
include/linux/userfaultfd_k.h | 7 +-
include/linux/vm_event_item.h | 4 -
include/linux/vmacache.h | 28 -
include/linux/vmstat.h | 6 -
include/linux/xarray.h | 1 +
include/trace/events/maple_tree.h | 123 +
include/trace/events/mmap.h | 71 +
init/main.c | 2 +
ipc/shm.c | 21 +-
kernel/acct.c | 11 +-
kernel/bpf/task_iter.c | 10 +-
kernel/debug/debug_core.c | 12 -
kernel/events/core.c | 3 +-
kernel/events/uprobes.c | 9 +-
kernel/fork.c | 58 +-
kernel/sched/fair.c | 10 +-
lib/Kconfig.debug | 18 +-
lib/Makefile | 3 +-
lib/maple_tree.c | 6967 +++
lib/test_maple_tree.c | 37398 ++++++++++++++++
mm/Makefile | 2 +-
mm/damon/vaddr-test.h | 37 +-
mm/damon/vaddr.c | 53 +-
mm/debug.c | 14 +-
mm/gup.c | 9 +-
mm/huge_memory.c | 4 +-
mm/init-mm.c | 4 +-
mm/internal.h | 78 +-
mm/khugepaged.c | 13 +-
mm/ksm.c | 18 +-
mm/madvise.c | 2 +-
mm/memcontrol.c | 6 +-
mm/memory.c | 33 +-
mm/mempolicy.c | 58 +-
mm/mlock.c | 34 +-
mm/mmap.c | 2086 +-
mm/mprotect.c | 7 +-
mm/mremap.c | 22 +-
mm/msync.c | 2 +-
mm/nommu.c | 127 +-
mm/oom_kill.c | 3 +-
mm/pagewalk.c | 2 +-
mm/swapfile.c | 4 +-
mm/util.c | 32 -
mm/vmacache.c | 117 -
mm/vmstat.c | 4 -
tools/testing/radix-tree/.gitignore | 2 +
tools/testing/radix-tree/Makefile | 9 +-
tools/testing/radix-tree/generated/autoconf.h | 1 +
tools/testing/radix-tree/linux.c | 160 +-
tools/testing/radix-tree/linux/kernel.h | 1 +
tools/testing/radix-tree/linux/lockdep.h | 2 +
tools/testing/radix-tree/linux/maple_tree.h | 7 +
tools/testing/radix-tree/linux/slab.h | 4 +
tools/testing/radix-tree/maple.c | 59 +
.../radix-tree/trace/events/maple_tree.h | 3 +
89 files changed, 47390 insertions(+), 1842 deletions(-)
create mode 100644 Documentation/core-api/maple_tree.rst
create mode 100644 include/linux/maple_tree.h
delete mode 100644 include/linux/vmacache.h
create mode 100644 include/trace/events/maple_tree.h
create mode 100644 lib/maple_tree.c
create mode 100644 lib/test_maple_tree.c
delete mode 100644 mm/vmacache.c
create mode 100644 tools/testing/radix-tree/linux/maple_tree.h
create mode 100644 tools/testing/radix-tree/maple.c
create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h
--
2.34.1
Powered by blists - more mailing lists