[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87547aa8-9629-4793-8483-68408235ee45@lucifer.local>
Date: Thu, 13 Nov 2025 15:15:46 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Suren Baghdasaryan <surenb@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Shakeel Butt <shakeel.butt@...ux.dev>, Jann Horn <jannh@...gle.com>,
stable@...r.kernel.org,
syzbot+131f9eb2b5807573275c@...kaller.appspotmail.com,
"Paul E . McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH] mm/mmap_lock: Reset maple state on lock_vma_under_rcu()
retry
On Wed, Nov 12, 2025 at 11:10:30AM -0500, Liam R. Howlett wrote:
> * Lorenzo Stoakes <lorenzo.stoakes@...cle.com> [251112 10:06]:
> > +cc Paul for hare-brained idea
> >
> > On Tue, Nov 11, 2025 at 04:56:05PM -0500, Liam R. Howlett wrote:
> > > The retry in lock_vma_under_rcu() drops the rcu read lock before
> > > reacquiring the lock and trying again. This may cause a use-after-free
> > > if the maple node the maple state was using was freed.
> > >
> > > The maple state is protected by the rcu read lock. When the lock is
> > > dropped, the state cannot be reused as it tracks pointers to objects
> > > that may be freed during the time where the lock was not held.
> > >
> > > Any time the rcu read lock is dropped, the maple state must be
> > > invalidated. Resetting the address and state to MA_START is the safest
> > > course of action, which will result in the next operation starting from
> > > the top of the tree.
> >
> > Since we all missed it I do wonder if we need some super clear comment
> > saying 'hey if you drop + re-acquire RCU lock you MUST revalidate mas state
> > by doing 'blah'.
> >
> > I think one source of confusion for me with maple tree operations is - what
> > to do if we are in a position where some kind of reset is needed?
> >
> > So even if I'd realised 'aha we need to reset this' it wouldn't be obvious
> > to me that we ought to set to the address.
> >
> > I guess a mas_reset() would keep mas->index, last where they where which
> > also wouldn't be right would it?
>
> mas->index and mas->last are updated to the values of the entry you
> found. So if you ran a mas_find(), the operation will continue until
> the limit is hit, or if you did a next/prev the address would be lost.
I guess in _this_ specific case since we're specifically looking for a VMA
at the address encoded in the index we'd be ok?
>
> This is why I say mas_set() is safer, because it will ensure we return
> to the same situation we started from, regardless of the operation.
Right yes.
>
>
> >
> > In which case a mas_reset() is _also_ not a valid operation if invoked
> > after dropping/reacquiring the RCU lock right?
>
> In this case we did a mas_walk(), so a mas_reset() would be fine here.
Right yeah.
>
> >
> > >
> > > Prior to commit 0b16f8bed19c ("mm: change vma_start_read() to drop RCU
> > > lock on failure"), the rcu read lock was dropped and NULL was returned,
> > > so the retry would not have happened. However, now that the read lock
> > > is dropped regardless of the return, we may use a freed maple tree node
> > > cached in the maple state on retry.
> > >
> > > Cc: Suren Baghdasaryan <surenb@...gle.com>
> > > Cc: stable@...r.kernel.org
> > > Fixes: 0b16f8bed19c ("mm: change vma_start_read() to drop RCU lock on failure")
> > > Reported-by: syzbot+131f9eb2b5807573275c@...kaller.appspotmail.com
> > > Closes: https://syzkaller.appspot.com/bug?extid=131f9eb2b5807573275c
> > > Signed-off-by: Liam R. Howlett <Liam.Howlett@...cle.com>
> >
> > The reasoning seems sensible & LGTM, so:
> >
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> >
> >
> > > ---
> > > mm/mmap_lock.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > > index 39f341caf32c0..f2532af6208c0 100644
> > > --- a/mm/mmap_lock.c
> > > +++ b/mm/mmap_lock.c
> > > @@ -257,6 +257,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> > > if (PTR_ERR(vma) == -EAGAIN) {
> > > count_vm_vma_lock_event(VMA_LOCK_MISS);
> > > /* The area was replaced with another one */
> > > + mas_set(&mas, address);
> >
> > I wonder if we could detect that the RCU lock was released (+ reacquired) in
> > mas_walk() in a debug mode, like CONFIG_VM_DEBUG_MAPLE_TREE?
> >
> > Not sure if that's feasible, maybe Paul can comment? :)
> >
> > I think Vlastimil made a similar kind of comment possibly off-list.
> >
> > Would there be much overhead if we just did this:
> >
> > retry:
> > rcu_read_lock();
> > mas_set(&mas, address);
> > vma = mas_walk(&mas);
> >
> > The retry path will be super rare, and I think the compiler should be smart
> > enough to not assign index, last twice and this would protect us.
>
> This is what existed before the 0b16f8bed19c change, which was
> introduced to try and avoid exactly these issues.
>
> I think there's no real way to avoid the complications of an rcu data
> structure. We've tried to make the interface as simple as possible, and
> in doing so, have hidden the implementation details of what happens in
> the 'state' - which is where all these troubles arise.
>
> I can add more documentation around the locking and maple state,
> hopefully people will find it useful and not just exist to go out of
> date.
Discussed more in reply to Matthew.
Yeah I agree that we're doing a trade-off here between abstraction and
performance overall.
>
> >
> > Then we could have some function like:
> >
> > mas_walk_from(&mas, address);
> >
> > That did this.
> >
> > Or, since we _literally_ only use mas for this one walk, have a simple
> > function like:
> >
> > /**
> > * ...
> > * Performs a single walk of a maple tree to the specified address,
> > * returning a pointer to an entry if found, or NULL if not.
> > * ...
> > */
> > static void *mt_walk(struct maple_tree *mt, unsigned long address)
> > {
> > MA_STATE(mas, mt, address, adderss);
> >
> > lockdep_assert_in_rcu_read_lock();
> > return mas_walk(&mas);
> > }
> >
> > That literally just does the walk as a one-off?
>
>
> You have described mtree_load(). The mas_ interfaces are designed for
> people to handle the locks themselves. The mtree_ interface handles the
> locking for people.
>
> I don't think we are using the simple interface because we are using the
> rcu read lock for the vma as well.
Yup I don't think we can use this as-is.
What I'm proposing Ig uess would be __mtree_load() or something with the
locking taken out.
I have suggested an alternative approach in reply to Matthew anyway.
>
> If you want to use the simple mtree_load() interface here, I think we'll
> need two rcu_read_lock()/unlock() calls (nesting is supported fwiu). I
> don't think we want to nest the locks in this call path though.
And we don't want to just unconditionally release RCU read lock after we're
done either so I think that's out.
>
> ...
>
> Thanks,
> Liam
Cheers, Lorenzo
Powered by blists - more mailing lists