[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9c9e9fb6b767556594b2cef023db01d45d8f8463.1762422915.git.lorenzo.stoakes@oracle.com>
Date: Thu, 6 Nov 2025 10:46:14 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Jonathan Corbet <corbet@....net>, David Hildenbrand <david@...hat.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Jann Horn <jannh@...gle.com>, Pedro Falcato <pfalcato@...e.de>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-mm@...ck.org,
linux-trace-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
Andrei Vagin <avagin@...il.com>
Subject: [PATCH v2 3/5] mm: implement sticky, copy on fork VMA flags
It's useful to be able to force a VMA to be copied on fork outside of the
parameters specified by vma_needs_copy(), which otherwise only copies page
tables if:
* The destination VMA has VM_UFFD_WP set
* The mapping is a PFN or mixed map
* The mapping is anonymous and forked in (i.e. vma->anon_vma is non-NULL)
Setting this flag implies that the page tables mapping the VMA are such
that simply re-faulting the VMA will not re-establish them in identical
form.
We introduce VM_COPY_ON_FORK to clearly identify which flags require this
behaviour, which currently is only VM_MAYBE_GUARD.
Any VMA flags which require this behaviour are inherently 'sticky', that
is, should we merge two VMAs together, this implies that the newly merged
VMA maps a range that requires page table copying on fork.
In order to implement this we must both introduce the concept of a 'sticky'
VMA flag and adjust the VMA merge logic accordingly, and also have VMA
merge still successfully succeed should one VMA have the flag set and
another not.
Note that we update the VMA expand logic to handle new VMA merging, as this
function is the one ultimately called by all instances of merging of new
VMAs.
This patch implements this, establishing VM_STICKY to contain all such
flags and VM_IGNORE_MERGE for those flags which should be ignored when
comparing adjacent VMA's flags for the purposes of merging.
As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it
already had this behaviour, alongside VM_STICKY as sticky flags by
implication must not disallow merge.
As a result of this change, VMAs with guard ranges will now not have their
merge behaviour impacted by doing so and can be freely merged with other
VMAs without VM_MAYBE_GUARD set.
We also update the VMA userland tests to account for the changes.
Note that VM_MAYBE_GUARD being set atomically remains correct as
vma_needs_copy() is invoked with the mmap and VMA write locks held,
excluding any race with madvise_guard_install().
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
---
include/linux/mm.h | 32 ++++++++++++++++++++++++++++++++
mm/memory.c | 3 +--
mm/vma.c | 22 ++++++++++++----------
tools/testing/vma/vma_internal.h | 32 ++++++++++++++++++++++++++++++++
4 files changed, 77 insertions(+), 12 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2ea65c646212..4d80eaf4ef3b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -527,6 +527,38 @@ extern unsigned int kobjsize(const void *objp);
#endif
#define VM_FLAGS_CLEAR (ARCH_VM_PKEY_FLAGS | VM_ARCH_CLEAR)
+/* Flags which should result in page tables being copied on fork. */
+#define VM_COPY_ON_FORK VM_MAYBE_GUARD
+
+/*
+ * Flags which should be 'sticky' on merge - that is, flags which, when one VMA
+ * possesses it but the other does not, the merged VMA should nonetheless have
+ * applied to it:
+ *
+ * VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
+ * metadata which should be unconditionally propagated upon
+ * fork. When merging two VMAs, we encapsulate this range in
+ * the merged VMA, so the flag should be 'sticky' as a result.
+ */
+#define VM_STICKY VM_COPY_ON_FORK
+
+/*
+ * VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
+ * of these flags and the other not does not preclude a merge.
+ *
+ * VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
+ * dirty bit -- the caller should mark merged VMA as dirty. If
+ * dirty bit won't be excluded from comparison, we increase
+ * pressure on the memory system forcing the kernel to generate
+ * new VMAs when old one could be extended instead.
+ *
+ * VM_STICKY - If one VMA has flags which most be 'sticky', that is ones
+ * which should propagate to all VMAs, but the other does not,
+ * the merge should still proceed with the merge logic applying
+ * sticky flags to the final VMA.
+ */
+#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
+
/*
* mapping from the currently active vm_flags protection bits (the
* low four bits) to a page protection mask..
diff --git a/mm/memory.c b/mm/memory.c
index 334732ab6733..7582a88f5332 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1480,8 +1480,7 @@ vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
if (src_vma->anon_vma)
return true;
- /* Guard regions have momdified page tables that require copying. */
- if (src_vma->vm_flags & VM_MAYBE_GUARD)
+ if (src_vma->vm_flags & VM_COPY_ON_FORK)
return true;
/*
diff --git a/mm/vma.c b/mm/vma.c
index 0c5e391fe2e2..6cb082bc5e29 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -89,15 +89,7 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
if (!mpol_equal(vmg->policy, vma_policy(vma)))
return false;
- /*
- * VM_SOFTDIRTY should not prevent from VMA merging, if we
- * match the flags but dirty bit -- the caller should mark
- * merged VMA as dirty. If dirty bit won't be excluded from
- * comparison, we increase pressure on the memory system forcing
- * the kernel to generate new VMAs when old one could be
- * extended instead.
- */
- if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_SOFTDIRTY)
+ if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_IGNORE_MERGE)
return false;
if (vma->vm_file != vmg->file)
return false;
@@ -808,6 +800,7 @@ static bool can_merge_remove_vma(struct vm_area_struct *vma)
static __must_check struct vm_area_struct *vma_merge_existing_range(
struct vma_merge_struct *vmg)
{
+ vm_flags_t sticky_flags = vmg->vm_flags & VM_STICKY;
struct vm_area_struct *middle = vmg->middle;
struct vm_area_struct *prev = vmg->prev;
struct vm_area_struct *next;
@@ -900,11 +893,13 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
if (merge_right) {
vma_start_write(next);
vmg->target = next;
+ sticky_flags |= (next->vm_flags & VM_STICKY);
}
if (merge_left) {
vma_start_write(prev);
vmg->target = prev;
+ sticky_flags |= (prev->vm_flags & VM_STICKY);
}
if (merge_both) {
@@ -974,6 +969,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
if (err || commit_merge(vmg))
goto abort;
+ vm_flags_set(vmg->target, sticky_flags);
khugepaged_enter_vma(vmg->target, vmg->vm_flags);
vmg->state = VMA_MERGE_SUCCESS;
return vmg->target;
@@ -1124,6 +1120,10 @@ int vma_expand(struct vma_merge_struct *vmg)
bool remove_next = false;
struct vm_area_struct *target = vmg->target;
struct vm_area_struct *next = vmg->next;
+ vm_flags_t sticky_flags;
+
+ sticky_flags = vmg->vm_flags & VM_STICKY;
+ sticky_flags |= target->vm_flags & VM_STICKY;
VM_WARN_ON_VMG(!target, vmg);
@@ -1133,6 +1133,7 @@ int vma_expand(struct vma_merge_struct *vmg)
if (next && (target != next) && (vmg->end == next->vm_end)) {
int ret;
+ sticky_flags |= next->vm_flags & VM_STICKY;
remove_next = true;
/* This should already have been checked by this point. */
VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
@@ -1159,6 +1160,7 @@ int vma_expand(struct vma_merge_struct *vmg)
if (commit_merge(vmg))
goto nomem;
+ vm_flags_set(target, sticky_flags);
return 0;
nomem:
@@ -1902,7 +1904,7 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
return a->vm_end == b->vm_start &&
mpol_equal(vma_policy(a), vma_policy(b)) &&
a->vm_file == b->vm_file &&
- !((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_SOFTDIRTY)) &&
+ !((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_IGNORE_MERGE)) &&
b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
}
diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
index ddf58a5e1add..984307a64ee9 100644
--- a/tools/testing/vma/vma_internal.h
+++ b/tools/testing/vma/vma_internal.h
@@ -119,6 +119,38 @@ extern unsigned long dac_mmap_min_addr;
#define VM_SEALED VM_NONE
#endif
+/* Flags which should result in page tables being copied on fork. */
+#define VM_COPY_ON_FORK VM_MAYBE_GUARD
+
+/*
+ * Flags which should be 'sticky' on merge - that is, flags which, when one VMA
+ * possesses it but the other does not, the merged VMA should nonetheless have
+ * applied to it:
+ *
+ * VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
+ * metadata which should be unconditionally propagated upon
+ * fork. When merging two VMAs, we encapsulate this range in
+ * the merged VMA, so the flag should be 'sticky' as a result.
+ */
+#define VM_STICKY VM_COPY_ON_FORK
+
+/*
+ * VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
+ * of these flags and the other not does not preclude a merge.
+ *
+ * VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
+ * dirty bit -- the caller should mark merged VMA as dirty. If
+ * dirty bit won't be excluded from comparison, we increase
+ * pressure on the memory system forcing the kernel to generate
+ * new VMAs when old one could be extended instead.
+ *
+ * VM_STICKY - If one VMA has flags which must be 'sticky', that is ones
+ * which should propagate to all VMAs, but the other does not,
+ * the merge should still proceed with the merge logic applying
+ * sticky flags to the final VMA.
+ */
+#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
+
#define FIRST_USER_ADDRESS 0UL
#define USER_PGTABLES_CEILING 0UL
--
2.51.0
Powered by blists - more mailing lists