linux-kernel - Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful vma_refcount_put() wrt SLAB_TYPESAFE_BY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6df9812c-96a5-41be-8b0e-5fff95ec757c@lucifer.local>
Date: Wed, 23 Jul 2025 19:14:18 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Pedro Falcato <pfalcato@...e.de>,
        Linux-MM <linux-mm@...ck.org>,
        kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful
 vma_refcount_put() wrt SLAB_TYPESAFE_BY_RCU

On Wed, Jul 23, 2025 at 06:26:53PM +0200, Jann Horn wrote:
> There's a racy UAF in `vma_refcount_put()` when called on the
> `lock_vma_under_rcu()` path because `SLAB_TYPESAFE_BY_RCU` is used
> without sufficient protection against concurrent object reuse:
>
> lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
> rcu_read_lock(). At that point, the VMA may be concurrently freed, and
> it can be recycled by another process. vma_start_read() then
> increments the vma->vm_refcnt (if it is in an acceptable range), and
> if this succeeds, vma_start_read() can return a reycled VMA. (As a
> sidenote, this goes against what the surrounding comments above
> vma_start_read() and in lock_vma_under_rcu() say - it would probably
> be cleaner to perform the vma->vm_mm check inside vma_start_read().)
>
> In this scenario where the VMA has been recycled, lock_vma_under_rcu()
> will then detect the mismatching ->vm_mm pointer and drop the VMA
> through vma_end_read(), which calls vma_refcount_put().

So in _correctly_ identifying the recycling, we then hit a problem. Fun!

> vma_refcount_put() does this:
>
> ```
> static inline void vma_refcount_put(struct vm_area_struct *vma)
> {
>         /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
>         struct mm_struct *mm = vma->vm_mm;

Are we at a point where we _should_ be looking at a VMA with vma->vm_mm ==
current->mm here?

Or can we not safely assume this?

Because if we can, can we not check for that here?

Do we need to keep the old mm around to wake up waiters if we're happily
freeing it anyway?

If not, then surely we can just do this check, and not do the wake up if
false?

I may be mising something or being stupid here so :P

>         int oldcnt;
>
>         rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
>         if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) {
>
>                 if (is_vma_writer_only(oldcnt - 1))
>                         rcuwait_wake_up(&mm->vma_writer_wait);
>         }
> }
> ```
>
> This is wrong: It implicitly assumes that the caller is keeping the
> VMA's mm alive, but in this scenario the caller has no relation to the
> VMA's mm, so the rcuwait_wake_up() can cause UAF.

Oh yikes. Yikes yikes yikes.

>
> In theory, this could happen to any multithreaded process where thread
> A is in the middle of pagefault handling while thread B is
> manipulating adjacent VMAs such that VMA merging frees the VMA looked
> up by thread A - but in practice, for UAF to actually happen, I think
> you need to at least hit three race windows in a row that are each on
> the order of a single memory access wide.

Hmm ok this is confusing, below you reference threads by number with
letter=process, and here you say thread 'A' and "B'.


>
> The interleaving leading to UAF is the following, where threads A1 and
> A2 are part of one process and thread B1 is part of another process:

Err but you refer to A1, A2, A3 below, should A3=B1?

> ```
> A1               A2               A3
> ==               ==               ==
> lock_vma_under_rcu
>   mas_walk
>                  <VMA modification removes the VMA>
>                                   mmap
>                                     <reallocate the VMA>
>   vma_start_read
>     READ_ONCE(vma->vm_lock_seq)
>     __refcount_inc_not_zero_limited_acquire
>                                   munmap
                                   ^
				   |
                                   OK so here, we have unmapped and
				   returned VMA to the slab so
				   SLAB_TYPESAFE_BY_RCU may now result in
				   you having a VMA with vma->vm_mm !=
				   current->mm right?

>                                     __vma_enter_locked
>                                       refcount_add_not_zero
>   vma_end_read
>     vma_refcount_put
      ^
      |
      Then here, we're grabbing process
      B's mm... whoops...

>       __refcount_dec_and_test
>                                       rcuwait_wait_event
>                                     <finish operation>
                                      ^
				      |
				      Then here, the VMA is finally
				      freed right?

>       rcuwait_wake_up [UAF]
        ^
	|
	And then here we try to deref freed mm->vma_writer_wait
	and boom.
> ```
>

This accurate?

> I'm not sure what the right fix is; I guess one approach would be to
> have a special version of vma_refcount_put() for cases where the VMA
> has been recycled by another MM that grabs an extra reference to the
> MM? But then dropping a reference to the MM afterwards might be a bit
> annoying and might require something like mmdrop_async()...

This seems a bit convoluted... And I worry about us getting the refcount
stuff wrong somehow, we'd have to be very careful about this to avoid
leaking VMAs somehow...