[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3d423848-2b55-4797-bdab-a9b42a373a45@suse.cz>
Date: Thu, 6 Nov 2025 14:46:38 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: Jonathan Corbet <corbet@....net>, David Hildenbrand <david@...hat.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>, Mike Rapoport
<rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
Michal Hocko <mhocko@...e.com>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Jann Horn <jannh@...gle.com>, Pedro Falcato <pfalcato@...e.de>,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-mm@...ck.org,
linux-trace-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
Andrei Vagin <avagin@...il.com>
Subject: Re: [PATCH v2 3/5] mm: implement sticky, copy on fork VMA flags
On 11/6/25 11:46, Lorenzo Stoakes wrote:
> It's useful to be able to force a VMA to be copied on fork outside of the
> parameters specified by vma_needs_copy(), which otherwise only copies page
> tables if:
>
> * The destination VMA has VM_UFFD_WP set
> * The mapping is a PFN or mixed map
> * The mapping is anonymous and forked in (i.e. vma->anon_vma is non-NULL)
>
> Setting this flag implies that the page tables mapping the VMA are such
> that simply re-faulting the VMA will not re-establish them in identical
> form.
>
> We introduce VM_COPY_ON_FORK to clearly identify which flags require this
> behaviour, which currently is only VM_MAYBE_GUARD.
>
> Any VMA flags which require this behaviour are inherently 'sticky', that
> is, should we merge two VMAs together, this implies that the newly merged
> VMA maps a range that requires page table copying on fork.
>
> In order to implement this we must both introduce the concept of a 'sticky'
> VMA flag and adjust the VMA merge logic accordingly, and also have VMA
> merge still successfully succeed should one VMA have the flag set and
> another not.
>
> Note that we update the VMA expand logic to handle new VMA merging, as this
> function is the one ultimately called by all instances of merging of new
> VMAs.
>
> This patch implements this, establishing VM_STICKY to contain all such
> flags and VM_IGNORE_MERGE for those flags which should be ignored when
> comparing adjacent VMA's flags for the purposes of merging.
>
> As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it
> already had this behaviour, alongside VM_STICKY as sticky flags by
> implication must not disallow merge.
>
> As a result of this change, VMAs with guard ranges will now not have their
> merge behaviour impacted by doing so and can be freely merged with other
> VMAs without VM_MAYBE_GUARD set.
>
> We also update the VMA userland tests to account for the changes.
>
> Note that VM_MAYBE_GUARD being set atomically remains correct as
> vma_needs_copy() is invoked with the mmap and VMA write locks held,
> excluding any race with madvise_guard_install().
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> ---
> include/linux/mm.h | 32 ++++++++++++++++++++++++++++++++
> mm/memory.c | 3 +--
> mm/vma.c | 22 ++++++++++++----------
> tools/testing/vma/vma_internal.h | 32 ++++++++++++++++++++++++++++++++
> 4 files changed, 77 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2ea65c646212..4d80eaf4ef3b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -527,6 +527,38 @@ extern unsigned int kobjsize(const void *objp);
> #endif
> #define VM_FLAGS_CLEAR (ARCH_VM_PKEY_FLAGS | VM_ARCH_CLEAR)
>
> +/* Flags which should result in page tables being copied on fork. */
> +#define VM_COPY_ON_FORK VM_MAYBE_GUARD
> +
> +/*
> + * Flags which should be 'sticky' on merge - that is, flags which, when one VMA
> + * possesses it but the other does not, the merged VMA should nonetheless have
> + * applied to it:
> + *
> + * VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
> + * metadata which should be unconditionally propagated upon
> + * fork. When merging two VMAs, we encapsulate this range in
> + * the merged VMA, so the flag should be 'sticky' as a result.
> + */
> +#define VM_STICKY VM_COPY_ON_FORK
TBH I don't see why there should be always an implication that copying on
fork implies stickiness in merging. Yeah, VM_MAYBE_GUARD is both, but in
general, is there any underlying property that makes this a rule?
> +/*
> + * VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
> + * of these flags and the other not does not preclude a merge.
> + *
> + * VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
> + * dirty bit -- the caller should mark merged VMA as dirty. If
> + * dirty bit won't be excluded from comparison, we increase
> + * pressure on the memory system forcing the kernel to generate
> + * new VMAs when old one could be extended instead.
So I wonder if VM_SOFTDIRTY should be actually also sticky and not just
VM_IGNORE_MERGE. The way I understand the flag suggests it should.
Right now AFAICS its rather undefined if the result of vma merge has the
flag - depending on which of the two VMA's stays and which is removed by the
merge. "the caller should mark merged VMA as dirty" in the comment you're
moving here seems not really happening or I'm missing it. __mmap_complete()
and do_brk_flags() do it, so any new areas are marked, but on pure merge of
two vma's due to e.g. mprotect() this is really nondetermintic? AFAICT the
sticky flag behavior would work perfectly for VM_SOFTDIRTY.
> + *
> + * VM_STICKY - If one VMA has flags which most be 'sticky', that is ones
> + * which should propagate to all VMAs, but the other does not,
> + * the merge should still proceed with the merge logic applying
> + * sticky flags to the final VMA.
> + */
> +#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
> +
> /*
> * mapping from the currently active vm_flags protection bits (the
> * low four bits) to a page protection mask..
> diff --git a/mm/memory.c b/mm/memory.c
> index 334732ab6733..7582a88f5332 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1480,8 +1480,7 @@ vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
> if (src_vma->anon_vma)
> return true;
>
> - /* Guard regions have momdified page tables that require copying. */
> - if (src_vma->vm_flags & VM_MAYBE_GUARD)
> + if (src_vma->vm_flags & VM_COPY_ON_FORK)
> return true;
>
> /*
> diff --git a/mm/vma.c b/mm/vma.c
> index 0c5e391fe2e2..6cb082bc5e29 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -89,15 +89,7 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
>
> if (!mpol_equal(vmg->policy, vma_policy(vma)))
> return false;
> - /*
> - * VM_SOFTDIRTY should not prevent from VMA merging, if we
> - * match the flags but dirty bit -- the caller should mark
> - * merged VMA as dirty. If dirty bit won't be excluded from
> - * comparison, we increase pressure on the memory system forcing
> - * the kernel to generate new VMAs when old one could be
> - * extended instead.
> - */
> - if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_SOFTDIRTY)
> + if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_IGNORE_MERGE)
> return false;
> if (vma->vm_file != vmg->file)
> return false;
> @@ -808,6 +800,7 @@ static bool can_merge_remove_vma(struct vm_area_struct *vma)
> static __must_check struct vm_area_struct *vma_merge_existing_range(
> struct vma_merge_struct *vmg)
> {
> + vm_flags_t sticky_flags = vmg->vm_flags & VM_STICKY;
> struct vm_area_struct *middle = vmg->middle;
> struct vm_area_struct *prev = vmg->prev;
> struct vm_area_struct *next;
> @@ -900,11 +893,13 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
> if (merge_right) {
> vma_start_write(next);
> vmg->target = next;
> + sticky_flags |= (next->vm_flags & VM_STICKY);
> }
>
> if (merge_left) {
> vma_start_write(prev);
> vmg->target = prev;
> + sticky_flags |= (prev->vm_flags & VM_STICKY);
> }
>
> if (merge_both) {
> @@ -974,6 +969,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range(
> if (err || commit_merge(vmg))
> goto abort;
>
> + vm_flags_set(vmg->target, sticky_flags);
> khugepaged_enter_vma(vmg->target, vmg->vm_flags);
> vmg->state = VMA_MERGE_SUCCESS;
> return vmg->target;
> @@ -1124,6 +1120,10 @@ int vma_expand(struct vma_merge_struct *vmg)
> bool remove_next = false;
> struct vm_area_struct *target = vmg->target;
> struct vm_area_struct *next = vmg->next;
> + vm_flags_t sticky_flags;
> +
> + sticky_flags = vmg->vm_flags & VM_STICKY;
> + sticky_flags |= target->vm_flags & VM_STICKY;
>
> VM_WARN_ON_VMG(!target, vmg);
>
> @@ -1133,6 +1133,7 @@ int vma_expand(struct vma_merge_struct *vmg)
> if (next && (target != next) && (vmg->end == next->vm_end)) {
> int ret;
>
> + sticky_flags |= next->vm_flags & VM_STICKY;
> remove_next = true;
> /* This should already have been checked by this point. */
> VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);
> @@ -1159,6 +1160,7 @@ int vma_expand(struct vma_merge_struct *vmg)
> if (commit_merge(vmg))
> goto nomem;
>
> + vm_flags_set(target, sticky_flags);
> return 0;
>
> nomem:
> @@ -1902,7 +1904,7 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct *
> return a->vm_end == b->vm_start &&
> mpol_equal(vma_policy(a), vma_policy(b)) &&
> a->vm_file == b->vm_file &&
> - !((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_SOFTDIRTY)) &&
> + !((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_IGNORE_MERGE)) &&
> b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
> }
>
> diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h
> index ddf58a5e1add..984307a64ee9 100644
> --- a/tools/testing/vma/vma_internal.h
> +++ b/tools/testing/vma/vma_internal.h
> @@ -119,6 +119,38 @@ extern unsigned long dac_mmap_min_addr;
> #define VM_SEALED VM_NONE
> #endif
>
> +/* Flags which should result in page tables being copied on fork. */
> +#define VM_COPY_ON_FORK VM_MAYBE_GUARD
> +
> +/*
> + * Flags which should be 'sticky' on merge - that is, flags which, when one VMA
> + * possesses it but the other does not, the merged VMA should nonetheless have
> + * applied to it:
> + *
> + * VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
> + * metadata which should be unconditionally propagated upon
> + * fork. When merging two VMAs, we encapsulate this range in
> + * the merged VMA, so the flag should be 'sticky' as a result.
> + */
> +#define VM_STICKY VM_COPY_ON_FORK
> +
> +/*
> + * VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
> + * of these flags and the other not does not preclude a merge.
> + *
> + * VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
> + * dirty bit -- the caller should mark merged VMA as dirty. If
> + * dirty bit won't be excluded from comparison, we increase
> + * pressure on the memory system forcing the kernel to generate
> + * new VMAs when old one could be extended instead.
> + *
> + * VM_STICKY - If one VMA has flags which must be 'sticky', that is ones
> + * which should propagate to all VMAs, but the other does not,
> + * the merge should still proceed with the merge logic applying
> + * sticky flags to the final VMA.
> + */
> +#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
> +
> #define FIRST_USER_ADDRESS 0UL
> #define USER_PGTABLES_CEILING 0UL
>
Powered by blists - more mailing lists