lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpGQQid_VPx9Y1TE4ozXEQM8tWixxLDnS3cvrM3sdT84QQ@mail.gmail.com>
Date: Tue, 6 Jan 2026 12:58:46 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, "Liam R . Howlett" <Liam.Howlett@...cle.com>, 
	Vlastimil Babka <vbabka@...e.cz>, Shakeel Butt <shakeel.butt@...ux.dev>, 
	David Hildenbrand <david@...nel.org>, Rik van Riel <riel@...riel.com>, Harry Yoo <harry.yoo@...cle.com>, 
	Jann Horn <jannh@...gle.com>, Mike Rapoport <rppt@...nel.org>, Michal Hocko <mhocko@...e.com>, 
	Pedro Falcato <pfalcato@...e.de>, Chris Li <chriscli@...gle.com>, 
	Barry Song <v-songbaohua@...o.com>, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/8] mm/rmap: remove unnecessary root lock dance in
 anon_vma clone, unmap

On Tue, Jan 6, 2026 at 5:58 AM Lorenzo Stoakes
<lorenzo.stoakes@...cle.com> wrote:
>
> On Mon, Dec 29, 2025 at 02:17:53PM -0800, Suren Baghdasaryan wrote:
> > On Wed, Dec 17, 2025 at 4:27 AM Lorenzo Stoakes
> > <lorenzo.stoakes@...cle.com> wrote:
> > >
> > > The root anon_vma of all anon_vma's linked to a VMA must by definition be
> > > the same - a VMA and all of its descendants/ancestors must exist in the
> > > same CoW chain.
> > >
> > > Commit bb4aa39676f7 ("mm: avoid repeated anon_vma lock/unlock sequences in
> > > anon_vma_clone()") introduced paranoid checking of the root anon_vma
> > > remaining the same throughout all AVC's in 2011.
> > >
> > > I think 15 years later we can safely assume that this is always the case.
> > >
> > > Additionally, since unfaulted VMAs being cloned from or unlinked are
> > > no-op's, we can simply lock the anon_vma's associated with this rather than
> > > doing any specific dance around this.
> > >
> > > This removes unnecessary checks and makes it clear that the root anon_vma
> > > is shared between all anon_vma's in a given VMA's anon_vma_chain.
> > >
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > > ---
> > >  mm/rmap.c | 48 ++++++++++++------------------------------------
> > >  1 file changed, 12 insertions(+), 36 deletions(-)
> > >
> > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > index 9332d1cbc643..60134a566073 100644
> > > --- a/mm/rmap.c
> > > +++ b/mm/rmap.c
> > > @@ -231,32 +231,6 @@ int __anon_vma_prepare(struct vm_area_struct *vma)
> > >         return -ENOMEM;
> > >  }
> > >
> > > -/*
> > > - * This is a useful helper function for locking the anon_vma root as
> > > - * we traverse the vma->anon_vma_chain, looping over anon_vma's that
> > > - * have the same vma.
> > > - *
> > > - * Such anon_vma's should have the same root, so you'd expect to see
> > > - * just a single mutex_lock for the whole traversal.
> > > - */
> > > -static inline struct anon_vma *lock_anon_vma_root(struct anon_vma *root, struct anon_vma *anon_vma)
> > > -{
> > > -       struct anon_vma *new_root = anon_vma->root;
> > > -       if (new_root != root) {
> > > -               if (WARN_ON_ONCE(root))
> > > -                       up_write(&root->rwsem);
> > > -               root = new_root;
> > > -               down_write(&root->rwsem);
> > > -       }
> > > -       return root;
> > > -}
> > > -
> > > -static inline void unlock_anon_vma_root(struct anon_vma *root)
> > > -{
> > > -       if (root)
> > > -               up_write(&root->rwsem);
> > > -}
> > > -
> > >  static void check_anon_vma_clone(struct vm_area_struct *dst,
> > >                                  struct vm_area_struct *src)
> > >  {
> > > @@ -307,26 +281,25 @@ static void check_anon_vma_clone(struct vm_area_struct *dst,
> > >  int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > >  {
> > >         struct anon_vma_chain *avc, *pavc;
> > > -       struct anon_vma *root = NULL;
> > >
> > >         if (!src->anon_vma)
> > >                 return 0;
> > >
> > >         check_anon_vma_clone(dst, src);
> > >
> > > +       anon_vma_lock_write(src->anon_vma);
> > >         list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
> > >                 struct anon_vma *anon_vma;
> > >
> > >                 avc = anon_vma_chain_alloc(GFP_NOWAIT);
> > >                 if (unlikely(!avc)) {
> > > -                       unlock_anon_vma_root(root);
> > > -                       root = NULL;
> > > +                       anon_vma_unlock_write(src->anon_vma);
> > >                         avc = anon_vma_chain_alloc(GFP_KERNEL);
> > >                         if (!avc)
> > >                                 goto enomem_failure;
> > > +                       anon_vma_lock_write(src->anon_vma);
> >
> > So, we drop and then reacquire src->anon_vma->root->rwsem, expecting
> > src->anon_vma and src->anon_vma->root to be the same. And IIUC
>
> I mean did you read the commit message? :)
>
> We're not expecting that, they _have_ to be the same. It simply makes no sense
> for them _not_ to be the same.

Sorry, maybe I chose my words badly to explain my concern. I meant
that we expect those fields to still be valid between the time when we
drop and re-ackquire the lock. The comment next to anon_vma.rwsem
definition says "W: modification, R: walking the list". Here we are
walking the list with the lock but are dropping the lock in the
process. I think there needs to be an explanation why this is safe.


>
> This is kind of the entire point of the patch.
>
> > src->vm_mm's mmap lock is what guarantees all this. If so, could you
> > please add a clarifying comment here?
>
> No that's not what guarantees it? I don't understand what you mean?
>
> I mean in a sense, if you had a totally broken situation where you didn't take
> exclusive locks and could do some horribly broken racing here, then sure you
> might end up with something broken, but I think it's super confusing to say 'oh
> this lock guarantees it', well no it guarantees that you aren't completely
> broken, what guarantees the shared root is how anon_vma_fork() works, which is
> to:
>
> - Clone.
> - If not reused an anon_vma (which by recursion would also have same root)
>   allocate new anon_vma.
> - If allocated new, set root to source VMA's anon_vma, which by definition also
>   has to be in its anon_vma_chain and have the same root (itself, if we're
>   cloning from the ultimate parent).
>
> But I don't think it'd be helpful to document all this, or we get into _adding_
> confusion by putting _too much_ in a comment.
>
> So I guess I'll just say,a s I do in the newly introduced
> clenaup_partial_anon_vmas():
>
>         /* All anon_vma's share the same root. */

Yeah, my concern was not the root being different but that the list
itself is stable after we drop the lock.

>
> >
> > >                 }
> > >                 anon_vma = pavc->anon_vma;
> > > -               root = lock_anon_vma_root(root, anon_vma);
> > >                 anon_vma_chain_link(dst, avc, anon_vma);
> > >
> > >                 /*
> > > @@ -343,7 +316,8 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
> > >         }
> > >         if (dst->anon_vma)
> > >                 dst->anon_vma->num_active_vmas++;
> > > -       unlock_anon_vma_root(root);
> > > +
> > > +       anon_vma_unlock_write(src->anon_vma);
> > >         return 0;
> > >
> > >   enomem_failure:
> > > @@ -438,15 +412,17 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
> > >  void unlink_anon_vmas(struct vm_area_struct *vma)
> > >  {
> > >         struct anon_vma_chain *avc, *next;
> > > -       struct anon_vma *root = NULL;
> > > +       struct anon_vma *active_anon_vma = vma->anon_vma;
> > >
> > >         /* Always hold mmap lock, read-lock on unmap possibly. */
> > >         mmap_assert_locked(vma->vm_mm);
> > >
> > >         /* Unfaulted is a no-op. */
> > > -       if (!vma->anon_vma)
> > > +       if (!active_anon_vma)
> > >                 return;
> > >
> > > +       anon_vma_lock_write(active_anon_vma);
> > > +
> > >         /*
> > >          * Unlink each anon_vma chained to the VMA.  This list is ordered
> > >          * from newest to oldest, ensuring the root anon_vma gets freed last.
> > > @@ -454,7 +430,6 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
> > >         list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
> > >                 struct anon_vma *anon_vma = avc->anon_vma;
> > >
> > > -               root = lock_anon_vma_root(root, anon_vma);
> > >                 anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
> > >
> > >                 /*
> > > @@ -470,13 +445,14 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
> > >                 anon_vma_chain_free(avc);
> > >         }
> > >
> > > -       vma->anon_vma->num_active_vmas--;
> > > +       active_anon_vma->num_active_vmas--;
> > >         /*
> > >          * vma would still be needed after unlink, and anon_vma will be prepared
> > >          * when handle fault.
> > >          */
> > >         vma->anon_vma = NULL;
> > > -       unlock_anon_vma_root(root);
> > > +       anon_vma_unlock_write(active_anon_vma);
> > > +
> > >
> > >         /*
> > >          * Iterate the list once more, it now only contains empty and unlinked
> > > --
> > > 2.52.0
> > >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ