lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <uvsjfodaoyikufskxriaycxcydhhgzndhs2hp4ydbwbgivhna6@h7svwhivantw>
Date: Wed, 12 Nov 2025 11:10:30 -0500
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Suren Baghdasaryan <surenb@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Shakeel Butt <shakeel.butt@...ux.dev>, Jann Horn <jannh@...gle.com>,
        stable@...r.kernel.org,
        syzbot+131f9eb2b5807573275c@...kaller.appspotmail.com,
        "Paul E . McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH] mm/mmap_lock: Reset maple state on lock_vma_under_rcu()
 retry

* Lorenzo Stoakes <lorenzo.stoakes@...cle.com> [251112 10:06]:
> +cc Paul for hare-brained idea
> 
> On Tue, Nov 11, 2025 at 04:56:05PM -0500, Liam R. Howlett wrote:
> > The retry in lock_vma_under_rcu() drops the rcu read lock before
> > reacquiring the lock and trying again.  This may cause a use-after-free
> > if the maple node the maple state was using was freed.
> >
> > The maple state is protected by the rcu read lock.  When the lock is
> > dropped, the state cannot be reused as it tracks pointers to objects
> > that may be freed during the time where the lock was not held.
> >
> > Any time the rcu read lock is dropped, the maple state must be
> > invalidated.  Resetting the address and state to MA_START is the safest
> > course of action, which will result in the next operation starting from
> > the top of the tree.
> 
> Since we all missed it I do wonder if we need some super clear comment
> saying 'hey if you drop + re-acquire RCU lock you MUST revalidate mas state
> by doing 'blah'.
> 
> I think one source of confusion for me with maple tree operations is - what
> to do if we are in a position where some kind of reset is needed?
> 
> So even if I'd realised 'aha we need to reset this' it wouldn't be obvious
> to me that we ought to set to the address.
> 
> I guess a mas_reset() would keep mas->index, last where they where which
> also wouldn't be right would it?

mas->index and mas->last are updated to the values of the entry you
found.  So if you ran a mas_find(), the operation will continue until
the limit is hit, or if you did a next/prev the address would be lost.

This is why I say mas_set() is safer, because it will ensure we return
to the same situation we started from, regardless of the operation.


> 
> In which case a mas_reset() is _also_ not a valid operation if invoked
> after dropping/reacquiring the RCU lock right?

In this case we did a mas_walk(), so a mas_reset() would be fine here.

> 
> >
> > Prior to commit 0b16f8bed19c ("mm: change vma_start_read() to drop RCU
> > lock on failure"), the rcu read lock was dropped and NULL was returned,
> > so the retry would not have happened.  However, now that the read lock
> > is dropped regardless of the return, we may use a freed maple tree node
> > cached in the maple state on retry.
> >
> > Cc: Suren Baghdasaryan <surenb@...gle.com>
> > Cc: stable@...r.kernel.org
> > Fixes: 0b16f8bed19c ("mm: change vma_start_read() to drop RCU lock on failure")
> > Reported-by: syzbot+131f9eb2b5807573275c@...kaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=131f9eb2b5807573275c
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@...cle.com>
> 
> The reasoning seems sensible & LGTM, so:
> 
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> 
> 
> > ---
> >  mm/mmap_lock.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index 39f341caf32c0..f2532af6208c0 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -257,6 +257,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> >  		if (PTR_ERR(vma) == -EAGAIN) {
> >  			count_vm_vma_lock_event(VMA_LOCK_MISS);
> >  			/* The area was replaced with another one */
> > +			mas_set(&mas, address);
> 
> I wonder if we could detect that the RCU lock was released (+ reacquired) in
> mas_walk() in a debug mode, like CONFIG_VM_DEBUG_MAPLE_TREE?
> 
> Not sure if that's feasible, maybe Paul can comment? :)
> 
> I think Vlastimil made a similar kind of comment possibly off-list.
> 
> Would there be much overhead if we just did this:
> 
> retry:
> 	rcu_read_lock();
> 	mas_set(&mas, address);
> 	vma = mas_walk(&mas);
> 
> The retry path will be super rare, and I think the compiler should be smart
> enough to not assign index, last twice and this would protect us.

This is what existed before the 0b16f8bed19c change, which was
introduced to try and avoid exactly these issues.

I think there's no real way to avoid the complications of an rcu data
structure.  We've tried to make the interface as simple as possible, and
in doing so, have hidden the implementation details of what happens in
the 'state' - which is where all these troubles arise.

I can add more documentation around the locking and maple state,
hopefully people will find it useful and not just exist to go out of
date.

> 
> Then we could have some function like:
> 
> mas_walk_from(&mas, address);
> 
> That did this.
> 
> Or, since we _literally_ only use mas for this one walk, have a simple
> function like:
> 
> 	/**
> 	 * ...
> 	 * Performs a single walk of a maple tree to the specified address,
> 	 * returning a pointer to an entry if found, or NULL if not.
> 	 * ...
> 	 */
> 	static void *mt_walk(struct maple_tree *mt, unsigned long address)
> 	{
> 		MA_STATE(mas, mt, address, adderss);
> 
> 		lockdep_assert_in_rcu_read_lock();
> 		return mas_walk(&mas);
> 	}
> 
> That literally just does the walk as a one-off?


You have described mtree_load().  The mas_ interfaces are designed for
people to handle the locks themselves.  The mtree_ interface handles the
locking for people.

I don't think we are using the simple interface because we are using the
rcu read lock for the vma as well.

If you want to use the simple mtree_load() interface here, I think we'll
need two rcu_read_lock()/unlock() calls (nesting is supported fwiu).  I
don't think we want to nest the locks in this call path though.

...

Thanks,
Liam

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ