[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241112194635.444146-1-surenb@google.com>
Date: Tue, 12 Nov 2024 11:46:30 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: akpm@...ux-foundation.org
Cc: willy@...radead.org, liam.howlett@...cle.com, lorenzo.stoakes@...cle.com,
mhocko@...e.com, vbabka@...e.cz, hannes@...xchg.org, mjguzik@...il.com,
oliver.sang@...el.com, mgorman@...hsingularity.net, david@...hat.com,
peterx@...hat.com, oleg@...hat.com, dave@...olabs.net, paulmck@...nel.org,
brauner@...nel.org, dhowells@...hat.com, hdanton@...a.com, hughd@...gle.com,
minchan@...gle.com, jannh@...gle.com, shakeel.butt@...ux.dev,
souravpanda@...gle.com, pasha.tatashin@...een.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, kernel-team@...roid.com, surenb@...gle.com
Subject: [PATCH v2 0/5] move per-vma lock into vm_area_struct
Back when per-vma locks were introduces, vm_lock was moved out of
vm_area_struct in [1] because of the performance regression caused by
false cacheline sharing. Recent investigation [2] revealed that the
regressions is limited to a rather old Broadwell microarchitecture and
even there it can be mitigated by disabling adjacent cacheline
prefetching, see [3].
This patchset moves vm_lock back into vm_area_struct, aligning it at the
cacheline boundary and changing the cache to be cache-aligned as well.
This causes VMA memory consumption to grow from 160 (vm_area_struct) + 40
(vm_lock) bytes to 256 bytes:
slabinfo before:
<name> ... <objsize> <objperslab> <pagesperslab> : ...
vma_lock ... 40 102 1 : ...
vm_area_struct ... 160 51 2 : ...
slabinfo after moving vm_lock:
<name> ... <objsize> <objperslab> <pagesperslab> : ...
vm_area_struct ... 256 32 2 : ...
Aggregate VMA memory consumption per 1000 VMAs grows from 50 to 64 pages,
which is 5.5MB per 100000 VMAs. This overhead will be addressed in a
separate patchset by replacing rw_semaphore in vma_lock's implementation
with a different type of lock.
Moving vm_lock into vm_area_struct lets us simplify vm_area_free() path,
which in turn allows us to use SLAB_TYPESAFE_BY_RCU for vm_area_struct
cache. This should facilitate vm_area_struct reuse and will minimize the
number of call_rcu() calls.
Suren Baghdasaryan (5):
mm: introduce vma_start_read_locked{_nested} helpers
mm: move per-vma lock into vm_area_struct
mm: mark vma as detached until it's added into vma tree
mm: make vma cache SLAB_TYPESAFE_BY_RCU
docs/mm: document latest changes to vm_lock
Documentation/mm/process_addrs.rst | 10 +++--
include/linux/mm.h | 54 +++++++++++++++++-----
include/linux/mm_types.h | 16 ++++---
include/linux/slab.h | 6 ---
kernel/fork.c | 72 +++++++-----------------------
mm/memory.c | 2 +-
mm/mmap.c | 2 +
mm/nommu.c | 2 +
mm/userfaultfd.c | 14 +++---
mm/vma.c | 3 ++
tools/testing/vma/vma_internal.h | 3 +-
11 files changed, 92 insertions(+), 92 deletions(-)
base-commit: 931086f2a88086319afb57cd3925607e8cda0a9f
--
2.47.0.277.g8800431eea-goog
Powered by blists - more mailing lists