[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpFzxCayf083d35dS7Py0AK-CSr3H=_mymP9yXcyWzOqPQ@mail.gmail.com>
Date: Mon, 28 Jul 2025 22:53:41 +0000
From: Suren Baghdasaryan <surenb@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: akpm@...ux-foundation.org, jannh@...gle.com, Liam.Howlett@...cle.com,
lorenzo.stoakes@...cle.com, pfalcato@...e.de, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2 1/1] mm: fix a UAF when vma->mm is freed after
vma->vm_refcnt got dropped
On Mon, Jul 28, 2025 at 10:04 PM Vlastimil Babka <vbabka@...e.cz> wrote:
>
> On 7/28/25 19:53, Suren Baghdasaryan wrote:
> > By inducing delays in the right places, Jann Horn created a reproducer
> > for a hard to hit UAF issue that became possible after VMAs were allowed
> > to be recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
> >
> > Race description is borrowed from Jann's discovery report:
> > lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
> > rcu_read_lock(). At that point, the VMA may be concurrently freed, and
> > it can be recycled by another process. vma_start_read() then
> > increments the vma->vm_refcnt (if it is in an acceptable range), and
> > if this succeeds, vma_start_read() can return a recycled VMA.
> >
> > In this scenario where the VMA has been recycled, lock_vma_under_rcu()
> > will then detect the mismatching ->vm_mm pointer and drop the VMA
> > through vma_end_read(), which calls vma_refcount_put().
> > vma_refcount_put() drops the refcount and then calls rcuwait_wake_up()
> > using a copy of vma->vm_mm. This is wrong: It implicitly assumes that
> > the caller is keeping the VMA's mm alive, but in this scenario the caller
> > has no relation to the VMA's mm, so the rcuwait_wake_up() can cause UAF.
> >
> > The diagram depicting the race:
> > T1 T2 T3
> > == == ==
> > lock_vma_under_rcu
> > mas_walk
> > <VMA gets removed from mm>
> > mmap
> > <the same VMA is reallocated>
> > vma_start_read
> > __refcount_inc_not_zero_limited_acquire
> > munmap
> > __vma_enter_locked
> > refcount_add_not_zero
> > vma_end_read
> > vma_refcount_put
> > __refcount_dec_and_test
> > rcuwait_wait_event
> > <finish operation>
> > rcuwait_wake_up [UAF]
> >
> > Note that rcuwait_wait_event() in T3 does not block because refcount
> > was already dropped by T1. At this point T3 can exit and free the mm
> > causing UAF in T1.
> > To avoid this we move vma->vm_mm verification into vma_start_read() and
> > grab vma->vm_mm to stabilize it before vma_refcount_put() operation.
> >
> > Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
> > Reported-by: Jann Horn <jannh@...gle.com>
> > Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+m2tA@mail.gmail.com/
> > Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
> > Cc: <stable@...r.kernel.org>
>
> As for further steps, considered replying to [1] but maybe it's better here.
>
> As for a KISS fix including stable, great. Seems a nice improvement to
> actually handle "vma->vm_mm != mm" in vma_start_read() like this - good idea!
>
> Less great is that there's now a subtle assumption that some (but not all!)
> cases where vma_start_read() returns NULL imply that we dropped the rcu
> lock. And it doesn't matter as the callers abort or fallback to mmap sem
> anyway in that case. Hopefully we can improve that a bit.
>
> The idea of moving rcu lock and mas walk inside vma_start_read() is indeed
> busted with lock_next_vma(). The iterator difference could be perhaps solved
> by having lock_vma_under_rcu() set up its own one (instead of MA_STATE) in a
> way that vma_next() would do the right thing for it. However there would
> still be the difference that lock_next_vma() expects we are already under
> rcu lock, and lock_vma_under_rcu() doesn't.
>
> So what we can perhaps do instead is move vma_start_read() to mm/mmap_lock.c
> (no other users so why expose it in a header for potential misuse). And then
> indeed just make it drop rcu lock completely (and not reacquire) any time
> it's returning NULL, document that and adjust callers to that. I think it's
> less subtle than dropping and reacquring, and should simplify the error
> handling in the callers a bit.
Thanks for the suggestion. That was actually exactly one of the
options I was considering but I thought this dropping RCU schema would
still be uglier than dropping and reacquiring the RCU lock. If you
think otherwise I can make the change as you suggested for mm-unstable
and keep this original change for stable only. Should I do that?
>
> [1]
> https://lore.kernel.org/all/CAJuCfpEMhGe_eZuFm__4CDstM9%3DOG2kTUTziNL-f%3DM3BYQor2A@mail.gmail.com/
>
> > ---
> > Changes since v1 [1]
> > - Made a copy of vma->mm before using it in vma_start_read(),
> > per Vlastimil Babka
> >
> > Notes:
> > - Applies cleanly over mm-unstable.
> > - Should be applied to 6.15 and 6.16 but these branches do not
> > have lock_next_vma() function, so the change in lock_next_vma() should be
> > skipped when applying to those branches.
> >
> > [1] https://lore.kernel.org/all/20250728170950.2216966-1-surenb@google.com/
> >
> > include/linux/mmap_lock.h | 23 +++++++++++++++++++++++
> > mm/mmap_lock.c | 10 +++-------
> > 2 files changed, 26 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> > index 1f4f44951abe..da34afa2f8ef 100644
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwait *w);
> > #include <linux/tracepoint-defs.h>
> > #include <linux/types.h>
> > #include <linux/cleanup.h>
> > +#include <linux/sched/mm.h>
> >
> > #define MMAP_LOCK_INITIALIZER(name) \
> > .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock),
> > @@ -183,6 +184,28 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
> > }
> >
> > rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> > +
> > + /*
> > + * If vma got attached to another mm from under us, that mm is not
> > + * stable and can be freed in the narrow window after vma->vm_refcnt
> > + * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
> > + * releasing vma->vm_refcnt.
> > + */
> > + if (unlikely(vma->vm_mm != mm)) {
> > + /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
> > + struct mm_struct *other_mm = vma->vm_mm;
> > + /*
> > + * __mmdrop() is a heavy operation and we don't need RCU
> > + * protection here. Release RCU lock during these operations.
> > + */
> > + rcu_read_unlock();
> > + mmgrab(other_mm);
> > + vma_refcount_put(vma);
> > + mmdrop(other_mm);
> > + rcu_read_lock();
> > + return NULL;
> > + }
> > +
> > /*
> > * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
> > * False unlocked result is impossible because we modify and check
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index 729fb7d0dd59..aa3bc42ecde0 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -164,8 +164,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> > */
> >
> > /* Check if the vma we locked is the right one. */
> > - if (unlikely(vma->vm_mm != mm ||
> > - address < vma->vm_start || address >= vma->vm_end))
> > + if (unlikely(address < vma->vm_start || address >= vma->vm_end))
> > goto inval_end_read;
> >
> > rcu_read_unlock();
> > @@ -236,11 +235,8 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm,
> > goto fallback;
> > }
> >
> > - /*
> > - * Verify the vma we locked belongs to the same address space and it's
> > - * not behind of the last search position.
> > - */
> > - if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end))
> > + /* Verify the vma is not behind of the last search position. */
> > + if (unlikely(from_addr >= vma->vm_end))
> > goto fallback_unlock;
> >
> > /*
> >
> > base-commit: c617a4dd7102e691fa0fb2bc4f6b369e37d7f509
>
Powered by blists - more mailing lists