lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpFYQasYVb3ds_-55aqhnjBaX2oH7fJ_wJSPZQybiSrAPg@mail.gmail.com>
Date: Mon, 29 Dec 2025 13:18:04 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Suren Baghdasaryan <surenb@...gle.com>, 
	Vlastimil Babka <vbabka@...e.cz>, Shakeel Butt <shakeel.butt@...ux.dev>, 
	David Hildenbrand <david@...nel.org>, Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>, 
	Jann Horn <jannh@...gle.com>, Mike Rapoport <rppt@...nel.org>, Michal Hocko <mhocko@...e.com>, 
	Pedro Falcato <pfalcato@...e.de>, Chris Li <chriscli@...gle.com>, 
	Barry Song <v-songbaohua@...o.com>, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/8] mm/rmap: improve anon_vma_clone(), unlink_anon_vmas()
 comments, add asserts

On Fri, Dec 19, 2025 at 10:22 AM Liam R. Howlett
<Liam.Howlett@...cle.com> wrote:
>
> * Lorenzo Stoakes <lorenzo.stoakes@...cle.com> [251217 07:27]:
> > Add kdoc comments, describe exactly what these functinos are used for in
> > detail, pointing out importantly that the anon_vma_clone() !dst->anon_vma
> > && src->anon_vma dance is ONLY for fork.
> >
> > Both are confusing functions that will be refactored in a subsequent patch
> > but the first stage is establishing documentation and some invariatns.
> >
> > Add some basic CONFIG_DEBUG_VM asserts that help document expected state,
> > specifically:
> >
> > anon_vma_clone()
> > - mmap write lock held.
> > - We do nothing if src VMA is not faulted.
> > - The destination VMA has no anon_vma_chain yet.
> > - We are always operating on the same active VMA (i.e. vma->anon-vma).

nit: s/vma->anon-vma/vma->anon_vma

> > - If not forking, must operate on the same mm_struct.
> >
> > unlink_anon_vmas()
> > - mmap lock held (read on unmap downgraded).

Out of curiosity I looked for the place where unlink_anon_vmas() is
called with mmap_lock downgraded to read but could not find it. Could
you please point me to it?

> > - That unfaulted VMAs are no-ops.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@...cle.com>
>
> > ---
> >  mm/rmap.c | 82 +++++++++++++++++++++++++++++++++++++++++++------------
> >  1 file changed, 64 insertions(+), 18 deletions(-)
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index d6799afe1114..0e34c0a69fbc 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -257,30 +257,60 @@ static inline void unlock_anon_vma_root(struct anon_vma *root)
> >               up_write(&root->rwsem);
> >  }
> >
> > -/*
> > - * Attach the anon_vmas from src to dst.
> > - * Returns 0 on success, -ENOMEM on failure.
> > - *
> > - * anon_vma_clone() is called by vma_expand(), vma_merge(), __split_vma(),
> > - * copy_vma() and anon_vma_fork(). The first four want an exact copy of src,
> > - * while the last one, anon_vma_fork(), may try to reuse an existing anon_vma to
> > - * prevent endless growth of anon_vma. Since dst->anon_vma is set to NULL before
> > - * call, we can identify this case by checking (!dst->anon_vma &&
> > - * src->anon_vma).
> > - *
> > - * If (!dst->anon_vma && src->anon_vma) is true, this function tries to find
> > - * and reuse existing anon_vma which has no vmas and only one child anon_vma.
> > - * This prevents degradation of anon_vma hierarchy to endless linear chain in
> > - * case of constantly forking task. On the other hand, an anon_vma with more
> > - * than one child isn't reused even if there was no alive vma, thus rmap
> > - * walker has a good chance of avoiding scanning the whole hierarchy when it
> > - * searches where page is mapped.
> > +static void check_anon_vma_clone(struct vm_area_struct *dst,
> > +                              struct vm_area_struct *src)
> > +{
> > +     /* The write lock must be held. */
> > +     mmap_assert_write_locked(src->vm_mm);
> > +     /* If not a fork (implied by dst->anon_vma) then must be on same mm. */
> > +     VM_WARN_ON_ONCE(dst->anon_vma && dst->vm_mm != src->vm_mm);
> > +
> > +     /* No source anon_vma is a no-op. */

I'm confused about the above comment. Do you mean that if
!src->anon_vma then it's a no-op and therefore this function shouldn't
be called? If so, we could simply have VM_WARN_ON_ONCE(!src->anon_vma)
but checks below have more conditions. Can this comment be perhaps
expanded please so that the reader clearly understands what is allowed
and what is not. For example, combination (!src->anon_vma &&
!dst->anon_vma) is allowed and we correctly not triggering a warning
here, however that's still a no-op IIUC.

> > +     VM_WARN_ON_ONCE(!src->anon_vma && !list_empty(&src->anon_vma_chain));
> > +     VM_WARN_ON_ONCE(!src->anon_vma && dst->anon_vma);
> > +     /* We are establishing a new anon_vma_chain. */
> > +     VM_WARN_ON_ONCE(!list_empty(&dst->anon_vma_chain));
> > +     /*
> > +      * On fork, dst->anon_vma is set NULL (temporarily). Otherwise, anon_vma
> > +      * must be the same across dst and src.

This is the second time in this small function where we have to remind
that dst->anon_vma==NULL means that we are forking. Maybe it's better
to introduce a `bool forking = dst->anon_vma==NULL;` variable at the
beginning and use it in all these checks?

I know, I'm nitpicking but as you said, anon_vma code is very
compicated, so the more clarity we can bring to it the better.

> > +      */
> > +     VM_WARN_ON_ONCE(dst->anon_vma && dst->anon_vma != src->anon_vma);
> > +}
> > +
> > +/**
> > + * anon_vma_clone - Establishes new anon_vma_chain objects in @dst linking to
> > + * all of the anon_vma objects contained within @src anon_vma_chain's.
> > + * @dst: The destination VMA with an empty anon_vma_chain.
> > + * @src: The source VMA we wish to duplicate.
> > + *
> > + * This is the heart of the VMA side of the anon_vma implementation - we invoke
> > + * this function whenever we need to set up a new VMA's anon_vma state.
> > + *
> > + * This is invoked for:
> > + *
> > + * - VMA Merge, but only when @dst is unfaulted and @src is faulted - meaning we
> > + *   clone @src into @dst.
> > + * - VMA split.
> > + * - VMA (m)remap.
> > + * - Fork of faulted VMA.
> > + *
> > + * In all cases other than fork this is simply a duplication. Fork additionally
> > + * adds a new active anon_vma.
> > + *
> > + * ONLY in the case of fork do we try to 'reuse' existing anon_vma's in an
> > + * anon_vma hierarchy, reusing anon_vma's which have no VMA associated with them
> > + * but do have a single child. This is to avoid waste of memory when repeatedly
> > + * forking.
> > + *
> > + * Returns: 0 on success, -ENOMEM on failure.
> >   */
> >  int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> >  {
> >       struct anon_vma_chain *avc, *pavc;
> >       struct anon_vma *root = NULL;
> >
> > +     check_anon_vma_clone(dst, src);
> > +
> >       list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> >               struct anon_vma *anon_vma;
> >
> > @@ -392,11 +422,27 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> >       return -ENOMEM;
> >  }
> >
> > +/**
> > + * unlink_anon_vmas() - remove all links between a VMA and anon_vma's, freeing
> > + * anon_vma_chain objects.
> > + * @vma: The VMA whose links to anon_vma objects is to be severed.
> > + *
> > + * As part of the process anon_vma_chain's are freed,
> > + * anon_vma->num_children,num_active_vmas is updated as required and, if the
> > + * relevant anon_vma references no further VMAs, its reference count is
> > + * decremented.
> > + */
> >  void unlink_anon_vmas(struct vm_area_struct *vma)
> >  {
> >       struct anon_vma_chain *avc, *next;
> >       struct anon_vma *root = NULL;
> >
> > +     /* Always hold mmap lock, read-lock on unmap possibly. */
> > +     mmap_assert_locked(vma->vm_mm);
> > +
> > +     /* Unfaulted is a no-op. */
> > +     VM_WARN_ON_ONCE(!vma->anon_vma && !list_empty(&vma->anon_vma_chain));

Hmm. anon_vma_clone() calls unlink_anon_vmas() after setting
dst->anon_vma=NULL in the enomem_failure path. This warning would
imply that in such case dst->anon_vma_chain is always non-empty. But I
don't think we can always expect that... What if the very first call
to anon_vma_chain_alloc() in anon_vma_clone()'s loop failed, I think
this would result in dst->anon_vma_chain being empty, no?

> > +
> >       /*
> >        * Unlink each anon_vma chained to the VMA.  This list is ordered
> >        * from newest to oldest, ensuring the root anon_vma gets freed last.
> > --
> > 2.52.0
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ