linux-kernel - Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful vma_refcount_put() wrt SLAB_TYPESAFE_BY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez36AcGBy8_wo09WEHz98Dp44yKZ_FiVtFt8tcV9WiHafA@mail.gmail.com>
Date: Wed, 23 Jul 2025 21:43:35 +0200
From: Jann Horn <jannh@...gle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, "Liam R. Howlett" <Liam.Howlett@...cle.com>, 
	Suren Baghdasaryan <surenb@...gle.com>, Vlastimil Babka <vbabka@...e.cz>, Pedro Falcato <pfalcato@...e.de>, 
	Linux-MM <linux-mm@...ck.org>, kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful
 vma_refcount_put() wrt SLAB_TYPESAFE_BY_RCU

Sorry, while typing up this mail I realized I didn't have this stuff
particularly straight in my head myself when writing my previous mails
about this...

On Wed, Jul 23, 2025 at 8:45 PM Lorenzo Stoakes
<lorenzo.stoakes@...cle.com> wrote:
> On Wed, Jul 23, 2025 at 08:30:30PM +0200, Jann Horn wrote:
> > On Wed, Jul 23, 2025 at 8:14 PM Lorenzo Stoakes
> > <lorenzo.stoakes@...cle.com> wrote:
> > > On Wed, Jul 23, 2025 at 06:26:53PM +0200, Jann Horn wrote:
> > > > There's a racy UAF in `vma_refcount_put()` when called on the
> > > > `lock_vma_under_rcu()` path because `SLAB_TYPESAFE_BY_RCU` is used
> > > > without sufficient protection against concurrent object reuse:
> > > >
> > > > lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
> > > > rcu_read_lock(). At that point, the VMA may be concurrently freed, and
> > > > it can be recycled by another process. vma_start_read() then
> > > > increments the vma->vm_refcnt (if it is in an acceptable range), and
> > > > if this succeeds, vma_start_read() can return a reycled VMA. (As a
> > > > sidenote, this goes against what the surrounding comments above
> > > > vma_start_read() and in lock_vma_under_rcu() say - it would probably
> > > > be cleaner to perform the vma->vm_mm check inside vma_start_read().)
> > > >
> > > > In this scenario where the VMA has been recycled, lock_vma_under_rcu()
> > > > will then detect the mismatching ->vm_mm pointer and drop the VMA
> > > > through vma_end_read(), which calls vma_refcount_put().
> > >
> > > So in _correctly_ identifying the recycling, we then hit a problem. Fun!
> > >
> > > > vma_refcount_put() does this:
> > > >
> > > > ```
> > > > static inline void vma_refcount_put(struct vm_area_struct *vma)
> > > > {
> > > >         /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
> > > >         struct mm_struct *mm = vma->vm_mm;
> > >
> > > Are we at a point where we _should_ be looking at a VMA with vma->vm_mm ==
> > > current->mm here?
> >
> > Well, you _hope_ to be looking at a VMA with vma->vm_mm==current->mm,
> > but if you lose a race it is intentional that you can end up with
> > another MM's VMA here.

(I forgot: The mm passed to lock_vma_under_rcu() is potentially
different from current->mm if we're coming from uffd_mfill_lock(),
which would be intentional and desired, but that's not relevant here.
Sorry for making things more confusing.)

> > > Or can we not safely assume this?
> >
> > No.
>
> What code paths lead to vma_refcount_put() with a foreign mm?

Calls to vma_refcount_put() from vma_start_read() or from
lock_vma_under_rcu() can have an MM different from the mm that was
passed to lock_vma_under_rcu().

Basically, lock_vma_under_rcu() locklessly looks up a VMA in the maple
tree of the provided MM; and so immediately after the maple tree
lookup, before we grab any sort of reference on the VMA, the VMA can
be freed, and reallocated by another process. If we then essentially
read-lock this VMA which is used by another MM (by incrementing its
refcount), waiters in that other MM might start waiting for us; and in
that case, when we notice we got the wrong VMA and bail out, we have
to wake those waiters up again after unlocking the VMA by dropping its
refcount.

> I mean maybe it's an unsafe assumption.
>
> I realise we are doing stuff for _reasons_, but I sort of HATE that we have
> put ourselves in a position where we might always see a recycled VMA and
> rely on a very very complicated seqnum-based locking mechanism to make sure
> this doesn't happen.

Yes, that is pretty much the definition of SLAB_TYPESAFE_BY_RCU. ^^
You get higher data cache hit rates in exchange for complicated "grab
some kinda stable object reference and then recheck if we got the
right one" stuff, though it is less annoying when dealing with a
normal refcount or spinlock or such rather than this kind of
open-coded sleepable read-write semaphore.

> It feels like we've made ourselves a challenging bed and uncomfy bed...
>
> >
> > > Because if we can, can we not check for that here?
> > >
> > > Do we need to keep the old mm around to wake up waiters if we're happily
> > > freeing it anyway?
> >
> > Well, we don't know if the MM has already been freed, or if it is
> > still alive and well and has writers who are stuck waiting for our
> > wakeup.
>
> But the mm we're talking about here is some recycled one from another
> thread?

The MM is not recycled, the VMA is recycled.

> Right so, we have:
>
> 'mm we meant to get' (which apparently can't be assumed to be current->mm)
> 'mm we actually got' (which may or may not be freed at any time)
>
> The _meant to get_ one might have eternal waiters. Or might not even need
> to be woken up.
>
> I don't see why keeping the 'actually got' one around really helps us? Am I
> missing something?

We basically have taken a read lock on a VMA that is part of the
"actually got" MM, and so we may have caused writers from that MM to
block and sleep, and since we did that we have to wake them back up
and say "sorry, locked the wrong object, please continue".