linux-kernel - Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful vma_refcount_put() wrt SLAB_TYPESAFE_BY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJuCfpHk_k5eVhAZTK=jJvES9311Hyo_YXxY-S56EAYSBuRVRQ@mail.gmail.com>
Date: Wed, 23 Jul 2025 19:30:12 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Vlastimil Babka <vbabka@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	Pedro Falcato <pfalcato@...e.de>, Linux-MM <linux-mm@...ck.org>, 
	kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful
 vma_refcount_put() wrt SLAB_TYPESAFE_BY_RCU

On Wed, Jul 23, 2025 at 1:27 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
>
> On Wed, Jul 23, 2025 at 11:19 AM Jann Horn <jannh@...gle.com> wrote:
> >
> > On Wed, Jul 23, 2025 at 8:10 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> > > On 7/23/25 19:49, Jann Horn wrote:
> > > > On Wed, Jul 23, 2025 at 7:32 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> > > >> On 7/23/25 18:26, Jann Horn wrote:
> > > >> > There's a racy UAF in `vma_refcount_put()` when called on the
> > > >> > `lock_vma_under_rcu()` path because `SLAB_TYPESAFE_BY_RCU` is used
> > > >> > without sufficient protection against concurrent object reuse:
> > > >>
> > > >> Oof.
> > > >>
> > > >> > I'm not sure what the right fix is; I guess one approach would be to
> > > >> > have a special version of vma_refcount_put() for cases where the VMA
> > > >> > has been recycled by another MM that grabs an extra reference to the
> > > >> > MM? But then dropping a reference to the MM afterwards might be a bit
> > > >> > annoying and might require something like mmdrop_async()...
> > > >>
> > > >> Would we need mmdrop_async()? Isn't this the case for mmget_not_zero() and
> > > >> mmput_async()?
> > > >
> > > > Now I'm not sure anymore if either of those approaches would work,
> > > > because they rely on the task that's removing the VMA to wait until we
> > > > do __refcount_dec_and_test() before deleting the MM... but I don't
> > > > think we have any such guarantee...
> > >
> > > I think it would be waiting in exit_mmap->vma_mark_detached(), but then
> > > AFAIU you're right and we'd really need to work with mmgrab/mmdrop because
> > > at that point the  mmget_not_zero() would already be failing...
> >
> > Ah, I see! vma_mark_detached() drops its reference, then does
> > __vma_enter_locked() to bump the refcount by VMA_LOCK_OFFSET again
> > (after which the reader path can't acquire it anymore), then waits
> > until the refcount drops to VMA_LOCK_OFFSET, and then decrements it
> > down to 0 from there. Makes sense.
>
> Yes, that's what I was checking to understand the race. In your explanation:
>
> A1 found the vma
> A2 detached it
> A3 attached it to another mm
> A1 refcounts the vma
> A1 realizes it's from another mm and calls vma_end_read() which tries
> to wake up another mm's waiter.

Ok, I finally got the entire picture. Now I understand why it would be
so hard to reproduce and that it depends on a very specific order of
execution. These steps should happen in precisely this order:

A3 calls __vma_enter_locked() and refcount_add_not_zero() fails due to
A1 holding a refcount (usual situation);
By the time A3 calls rcuwait_wait_event(), A1 should drop its refcount
so that rcuwait_wait_event() does not enter wait;
By the time A1 calls rcuwait_wake_up(), A3 should free the mm leading
to A1's UAF;

Very clever.
I was wrong thinking that we can call rcuwait_wake_up() for the
original mm that vma was attached before. We do have to
rcuwait_wake_up() the mm that vma is attached to at the time of
vma_refcount_put(), so using vma->vm_mm in vma_refcount_put() is the
right thing to do because our refcount might be blocking operations on
the current vma->mm, not the one vma was originally attached to. We
just have to stabilize vma->mm.

So, I think vma_refcount_put() can mmgrab(vma->mm) before calling
__refcount_dec_and_test(), to stabilize that mm and then mmdrop()
after it calls rcuwait_wake_up(). What do you think about this
approach, folks?

>
> Vlastimil is right that if A1 was able to successfully elevate vma's
> refcount then:
> 1. vma must be attached to some valid mm. This is true because if the
> vma is detached, vma_start_read() would not be able to elevate its
> refcount. Once vma_start_read() elevates the refcount, vma will not
> detach from under us because vma_mark_detached() will block until no
> readers are using the vma.
> 2. vma->mm can't be destroyed from under us because of that
> exit_mmap()->vma_mark_detached() which again will ensure no readers
> are holding a reference to the vmas of that mm.
>
> So, a special version of vma_refcount_put() that takes mm as a
> parameter and does mmgrab/mmdrop before using that mm might work. I'll
> do some more digging and maybe test this solution with your reproducer
> to see if that works as I would expect.
> Thanks,
> Suren.