[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1769198904.git.lorenzo.stoakes@oracle.com>
Date: Fri, 23 Jan 2026 20:12:10 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: David Hildenbrand <david@...nel.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Shakeel Butt <shakeel.butt@...ux.dev>, Jann Horn <jannh@...gle.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
linux-rt-devel@...ts.linux.dev, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Boqun Feng <boqun.feng@...il.com>, Waiman Long <longman@...hat.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Clark Williams <clrkwllms@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>
Subject: [PATCH v4 00/10] mm: add and use vma_assert_stabilised() helper
This series first introduces a series of refactorings, intended to
significantly improve readability and abstraction of the code.
Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
be changed underneath us. This will be the case if EITHER the VMA lock or
the mmap lock is held.
We already open-code this in two places - anon_vma_name() in mm/madvise.c
and vma_flag_set_atomic() in include/linux/mm.h.
This series adds vma_assert_stablised() which abstract this can be used in
these callsites instead.
This implementation uses lockdep where possible - that is VMA read locks -
which correctly track read lock acquisition/release via:
vma_start_read() ->
rwsem_acquire_read()
vma_start_read_locked() ->
vma_start_read_locked_nested() ->
rwsem_acquire_read()
And:
vma_end_read() ->
vma_refcount_put() ->
rwsem_release()
We don't track the VMA locks using lockdep for VMA write locks, however
these are predicated upon mmap write locks whose lockdep state we do track,
and additionally vma_assert_stabillised() asserts this check if VMA read
lock is not held, so we get lockdep coverage in this case also.
We also add extensive comments to describe what we're doing.
There's some tricky stuff around mmap locking and stabilisation races that
we have to be careful of that I describe in the patch introducing
vma_assert_stabilised().
This change also lays the foundation for future series to add this assert
in further places where we wish to make it clear that we rely upon a
stabilised VMA.
The motivation for this change was precisely this.
v4:
* Propagated tags (thanks Vlastimil, Suren!)
* Updated reference count documentation in 2/10 as per Vlastimil, Suren.
* Updated 7/10 to update the references in the reference count comment from
__vma_exit_locked() to __vma_end_exclude_readers().
* Renamed are_readers_excluded() to __vma_are_readers_excluded() as per
Vlastimil.
* Several more comment updates as per Vlastimil, Suren in 3/10.
* Updated 3/10 commit message as per Suren.
* Updated __vma_refcount_put() to just return the newcnt as per Suren.
* Renamed __vma_refcount_put() to __vma_refcount_put_return() as per Vlastimil.
* Made __vma_refcount_put_return() __must_check too.
* Comment fixups on 4/10 as per Vlastimil.
* Renamed __vma_enter_exclusive_locked() and __vma_exit_exclusive_locked()
to __vma_start_exclude_readers() and __vma_end_exclude_readers() as per
Vlastimil in 6/10.
* Reworked comment as per Suren in 6/10.
* Avoided WARN_ON_ONCE() function invocation as per Suren in 6/10.
* s/ves->locked/ves->exclusive/ as per Suren in 7/10.
* Removed confusing asserts in 7/10 as per Suren.
* Changed from !ves.detached to ves.exclusive in 7/10 as per Suren.
* Updated comments in 7/10 as per Suren.
* Removed broken assert in __vma_end_exclude_readers() in 7/10 as per
Vlastimil.
* Separated out vma_mark_detached() into static inline portion and unlikely
exclude readers in 7/10 as per Vlastimil.
* Removed mm seq num output parameter from __is_vma_write_locked() as per
Vlastimil in 8/10.
* Converted VM_BUG_ON_VMA() to VM_WARN_ON_ONCE() in 8/10 as per Vlastimil
(though he said it in reply to a future commit :).
* Added helper function __vma_raw_mm_seqnum() to aid the conversion of
__is_vma_write_locked() and updated the commit message accordingly.
* Moved mmap_assert_write_locked() to __vma_raw_mm_seqnum() is it is
required for this access to be valid.
* Replaced VM_BUG_ON_VMA() with VM_WARN_ON_ONCE_VMA() on 9/10 as per
Vlastiml.
* Renamed refs to refcnt in vma_assert_locked() to be consistent.
* Moved comment about reference count possible values above refcnt
assignment so it's not just weirdly at the top of the function.
v3:
* Added 8 patches of refactoring the VMA lock implementation :)
* Dropped the vma_is_*locked() predicates as too difficult to get entirely
right.
* Updated vma_assert_locked() to assert what we sensibly can, use lockdep
if possible and invoke vma_assert_write_locked() to share code as before.
* Took into account extensive feedback received from Vlastimil (thanks! :)
https://lore.kernel.org/all/cover.1769086312.git.lorenzo.stoakes@oracle.com/
v2:
* Added lockdep as much as possible to the mix as per Peter and Sebastian.
* Added comments to make clear what we're doing in each case.
* I realise I made a mistake in saying the previous duplicative VMA stable
asserts were wrong - vma_assert_locked() is not a no-op if
!CONFIG_PER_VMA_LOCK, instead it degrades to asserting that the mmap lock
is held, so this is correct, though means we'd have checked this twice,
only triggering an assert the second time.
* Accounted for is_vma_writer_only() case in vma_is_read_locked().
* Accounted for two hideous issues - we cannot check VMA lock first,
because we may be holding a VMA write lock and be raced by VMA readers of
_other_ VMA's. If we check the mmap lock first and assert, we may hold a
VMA read lock and race other threads which hodl the mmap read lock and
fail an assert. We resolve this by a precise mmap ownership check if
lockdep is used, and allowing the check to be approximate if no lockdep.
* Added more comments and updated commit logs.
* Dropped Suren's Suggested-by as significant changes in this set (this was for
the vma_is_read_locked() as a concept).
https://lore.kernel.org/all/cover.1768855783.git.lorenzo.stoakes@oracle.com/
v1:
https://lore.kernel.org/all/cover.1768569863.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (10):
mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG
mm/vma: document possible vma->vm_refcnt values and reference comment
mm/vma: rename is_vma_write_only(), separate out shared refcount put
mm/vma: add+use vma lockdep acquire/release defines
mm/vma: de-duplicate __vma_enter_locked() error path
mm/vma: clean up __vma_enter/exit_locked()
mm/vma: introduce helper struct + thread through exclusive lock fns
mm/vma: improve and document __is_vma_write_locked()
mm/vma: update vma_assert_locked() to use lockdep
mm/vma: add and use vma_assert_stabilised()
include/linux/mm.h | 5 +-
include/linux/mm_types.h | 57 +++++++-
include/linux/mmap_lock.h | 264 +++++++++++++++++++++++++++++++++-----
mm/madvise.c | 4 +-
mm/mmap_lock.c | 173 ++++++++++++++++---------
5 files changed, 396 insertions(+), 107 deletions(-)
--
2.52.0
Powered by blists - more mailing lists