[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <94935cf140e3279c234b39e0d976c4718c547c73.1762422915.git.lorenzo.stoakes@oracle.com>
Date: Thu, 6 Nov 2025 10:46:13 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Jonathan Corbet <corbet@....net>, David Hildenbrand <david@...hat.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Jann Horn <jannh@...gle.com>, Pedro Falcato <pfalcato@...e.de>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-mm@...ck.org,
linux-trace-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
Andrei Vagin <avagin@...il.com>
Subject: [PATCH v2 2/5] mm: add atomic VMA flags, use VM_MAYBE_GUARD as such
This patch adds the ability to atomically set VMA flags with only the mmap
read/VMA read lock held.
As this could be hugely problematic for VMA flags in general given that all
other accesses are non-atomic and serialised by the mmap/VMA locks, we
implement this with a strict allow-list - that is, only designated flags
are allowed to do this.
We make VM_MAYBE_GUARD one of these flags, and then set it under the mmap
read flag upon guard region installation.
The places where this flag is used currently and matter are:
* VMA merge - performed under mmap/VMA write lock, therefore excluding
racing writes.
* /proc/$pid/smaps - can race the write, however this isn't meaningful as
the flag write is performed at the point of the guard region being
established, and thus an smaps reader can't reasonably expect to avoid
races. Due to atomicity, a reader will observe either the flag being set
or not. Therefore consistency will be maintained.
In all other cases the flag being set is irrelevant and atomicity
guarantees other flags will be read correctly.
We additionally update madvise_guard_install() to ensure that
anon_vma_prepare() is set for anonymous VMAs to maintain consistency with
the assumption that any anonymous VMA with page tables will have an
anon_vma set, and any with an anon_vma unset will not have page tables
established.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
---
include/linux/mm.h | 23 +++++++++++++++++++++++
mm/madvise.c | 22 ++++++++++++++--------
2 files changed, 37 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a5516bff75a..2ea65c646212 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -518,6 +518,9 @@ extern unsigned int kobjsize(const void *objp);
/* This mask represents all the VMA flag bits used by mlock */
#define VM_LOCKED_MASK (VM_LOCKED | VM_LOCKONFAULT)
+/* These flags can be updated atomically via VMA/mmap read lock. */
+#define VM_ATOMIC_SET_ALLOWED VM_MAYBE_GUARD
+
/* Arch-specific flags to clear when updating VM flags on protection change */
#ifndef VM_ARCH_CLEAR
# define VM_ARCH_CLEAR VM_NONE
@@ -860,6 +863,26 @@ static inline void vm_flags_mod(struct vm_area_struct *vma,
__vm_flags_mod(vma, set, clear);
}
+/*
+ * Set VMA flag atomically. Requires only VMA/mmap read lock. Only specific
+ * valid flags are allowed to do this.
+ */
+static inline void vma_flag_set_atomic(struct vm_area_struct *vma,
+ int bit)
+{
+ const vm_flags_t mask = BIT(bit);
+
+ /* mmap read lock/VMA read lock must be held. */
+ if (!rwsem_is_locked(&vma->vm_mm->mmap_lock))
+ vma_assert_locked(vma);
+
+ /* Only specific flags are permitted */
+ if (WARN_ON_ONCE(!(mask & VM_ATOMIC_SET_ALLOWED)))
+ return;
+
+ set_bit(bit, &vma->__vm_flags);
+}
+
static inline void vma_set_anonymous(struct vm_area_struct *vma)
{
vma->vm_ops = NULL;
diff --git a/mm/madvise.c b/mm/madvise.c
index 67bdfcb315b3..de918b107cfc 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1139,15 +1139,21 @@ static long madvise_guard_install(struct madvise_behavior *madv_behavior)
return -EINVAL;
/*
- * If we install guard markers, then the range is no longer
- * empty from a page table perspective and therefore it's
- * appropriate to have an anon_vma.
- *
- * This ensures that on fork, we copy page tables correctly.
+ * Set atomically under read lock. All pertinent readers will need to
+ * acquire an mmap/VMA write lock to read it. All remaining readers may
+ * or may not see the flag set, but we don't care.
+ */
+ vma_flag_set_atomic(vma, VM_MAYBE_GUARD_BIT);
+
+ /*
+ * If anonymous and we are establishing page tables the VMA ought to
+ * have an anon_vma associated with it.
*/
- err = anon_vma_prepare(vma);
- if (err)
- return err;
+ if (vma_is_anonymous(vma)) {
+ err = anon_vma_prepare(vma);
+ if (err)
+ return err;
+ }
/*
* Optimistically try to install the guard marker pages first. If any
--
2.51.0
Powered by blists - more mailing lists