linux-kernel - Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful vma_refcount_put() wrt SLAB_TYPESAFE_BY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <28e1fc65-995c-4185-b5c2-7eb001312698@lucifer.local>
Date: Wed, 23 Jul 2025 19:39:10 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Vlastimil Babka <vbabka@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Pedro Falcato <pfalcato@...e.de>, Linux-MM <linux-mm@...ck.org>,
        kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful
 vma_refcount_put() wrt SLAB_TYPESAFE_BY_RCU

On Wed, Jul 23, 2025 at 08:19:09PM +0200, Jann Horn wrote:
> On Wed, Jul 23, 2025 at 8:10 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> > On 7/23/25 19:49, Jann Horn wrote:
> > > On Wed, Jul 23, 2025 at 7:32 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> > >> On 7/23/25 18:26, Jann Horn wrote:
> > >> > There's a racy UAF in `vma_refcount_put()` when called on the
> > >> > `lock_vma_under_rcu()` path because `SLAB_TYPESAFE_BY_RCU` is used
> > >> > without sufficient protection against concurrent object reuse:
> > >>
> > >> Oof.
> > >>
> > >> > I'm not sure what the right fix is; I guess one approach would be to
> > >> > have a special version of vma_refcount_put() for cases where the VMA
> > >> > has been recycled by another MM that grabs an extra reference to the
> > >> > MM? But then dropping a reference to the MM afterwards might be a bit
> > >> > annoying and might require something like mmdrop_async()...
> > >>
> > >> Would we need mmdrop_async()? Isn't this the case for mmget_not_zero() and
> > >> mmput_async()?
> > >
> > > Now I'm not sure anymore if either of those approaches would work,
> > > because they rely on the task that's removing the VMA to wait until we
> > > do __refcount_dec_and_test() before deleting the MM... but I don't
> > > think we have any such guarantee...
> >
> > I think it would be waiting in exit_mmap->vma_mark_detached(), but then
> > AFAIU you're right and we'd really need to work with mmgrab/mmdrop because
> > at that point the  mmget_not_zero() would already be failing...
>
> Ah, I see! vma_mark_detached() drops its reference, then does
> __vma_enter_locked() to bump the refcount by VMA_LOCK_OFFSET again
> (after which the reader path can't acquire it anymore), then waits
> until the refcount drops to VMA_LOCK_OFFSET, and then decrements it
> down to 0 from there. Makes sense.

Sorry, this is really my fault because I didn't closely follow the
reimplementation of the VMA locks closely enough and so am a little behind
here (I'll fix this, probably by documenting them fully in the relevant doc
page).

So forgive me if I"m asking stupid questions.

What exactly is the issue with the waiter not being triggered?

I see in vma_mark_detached():

	/*
	 * We are the only writer, so no need to use vma_refcount_put().
	 * The condition below is unlikely because the vma has been already
	 * write-locked and readers can increment vm_refcnt only temporarily
	 * before they check vm_lock_seq, realize the vma is locked and drop
	 * back the vm_refcnt. That is a narrow window for observing a raised
	 * vm_refcnt.
	 */

So, if this is happening at the point of the unmap, and we're unlucky enough to
have some readers have spuriously incremented the refcnt before they check
vm_lock_seq, we trigger __vma_enter_locked() and wait on other VMAs to
vma_refcount_put() to wake it up vai rcuwait_wake_up() if the refcount is still
raised (which it should be right?)

So actually are we going to be left with readers sat around waiting forever? If
the scenario mentioned happens?

If we make the rando mm we are now referencing stick around, aren't we just
spuriously triggering this thing while potentially leaving actual waiters
waiting?

I may be completely misunderstanding here...