linux-kernel - Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful vma_refcount_put() wrt SLAB_TYPESAFE_BY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG48ez0KE9bfFJ+yAMPUAj=_hcNRfes3=nfQd7Rv95TSnHtUCg@mail.gmail.com>
Date: Wed, 23 Jul 2025 21:52:41 +0200
From: Jann Horn <jannh@...gle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Vlastimil Babka <vbabka@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Suren Baghdasaryan <surenb@...gle.com>, 
	Pedro Falcato <pfalcato@...e.de>, Linux-MM <linux-mm@...ck.org>, 
	kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful
 vma_refcount_put() wrt SLAB_TYPESAFE_BY_RCU

On Wed, Jul 23, 2025 at 8:39 PM Lorenzo Stoakes
<lorenzo.stoakes@...cle.com> wrote:
> On Wed, Jul 23, 2025 at 08:19:09PM +0200, Jann Horn wrote:
> > On Wed, Jul 23, 2025 at 8:10 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> > > On 7/23/25 19:49, Jann Horn wrote:
> > > > On Wed, Jul 23, 2025 at 7:32 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> > > >> On 7/23/25 18:26, Jann Horn wrote:
> > > >> > There's a racy UAF in `vma_refcount_put()` when called on the
> > > >> > `lock_vma_under_rcu()` path because `SLAB_TYPESAFE_BY_RCU` is used
> > > >> > without sufficient protection against concurrent object reuse:
> > > >>
> > > >> Oof.
> > > >>
> > > >> > I'm not sure what the right fix is; I guess one approach would be to
> > > >> > have a special version of vma_refcount_put() for cases where the VMA
> > > >> > has been recycled by another MM that grabs an extra reference to the
> > > >> > MM? But then dropping a reference to the MM afterwards might be a bit
> > > >> > annoying and might require something like mmdrop_async()...
> > > >>
> > > >> Would we need mmdrop_async()? Isn't this the case for mmget_not_zero() and
> > > >> mmput_async()?
> > > >
> > > > Now I'm not sure anymore if either of those approaches would work,
> > > > because they rely on the task that's removing the VMA to wait until we
> > > > do __refcount_dec_and_test() before deleting the MM... but I don't
> > > > think we have any such guarantee...
> > >
> > > I think it would be waiting in exit_mmap->vma_mark_detached(), but then
> > > AFAIU you're right and we'd really need to work with mmgrab/mmdrop because
> > > at that point the  mmget_not_zero() would already be failing...
> >
> > Ah, I see! vma_mark_detached() drops its reference, then does
> > __vma_enter_locked() to bump the refcount by VMA_LOCK_OFFSET again
> > (after which the reader path can't acquire it anymore), then waits
> > until the refcount drops to VMA_LOCK_OFFSET, and then decrements it
> > down to 0 from there. Makes sense.
>
> Sorry, this is really my fault because I didn't closely follow the
> reimplementation of the VMA locks closely enough and so am a little behind
> here (I'll fix this, probably by documenting them fully in the relevant doc
> page).
>
> So forgive me if I"m asking stupid questions.
>
> What exactly is the issue with the waiter not being triggered?
>
> I see in vma_mark_detached():
>
>         /*
>          * We are the only writer, so no need to use vma_refcount_put().
>          * The condition below is unlikely because the vma has been already
>          * write-locked and readers can increment vm_refcnt only temporarily
>          * before they check vm_lock_seq, realize the vma is locked and drop
>          * back the vm_refcnt. That is a narrow window for observing a raised
>          * vm_refcnt.
>          */
>
> So, if this is happening at the point of the unmap, and we're unlucky enough to
> have some readers have spuriously incremented the refcnt before they check
> vm_lock_seq, we trigger __vma_enter_locked() and wait on other VMAs to
> vma_refcount_put() to wake it up vai rcuwait_wake_up() if the refcount is still
> raised (which it should be right?)

I'm not sure if I'm understanding you correctly; but yes,
__vma_enter_locked() waits for all the waiters to drop their
"refcounts". (It's not really a refcount, you can also think of it as
a sleepable read-write lock where the low bits are the number of
readers.)

> So actually are we going to be left with readers sat around waiting forever? If
> the scenario mentioned happens?

I'm not sure I understand the question. Readers don't wait, they bail
out if they hit contention and retry with the mmap lock. As far as VMA
locks are concerned, readers basically always trylock, only writers
can wait.

> If we make the rando mm we are now referencing stick around, aren't we just
> spuriously triggering this thing while potentially leaving actual waiters
> waiting?

In that case, the thing we've read-locked is not part of the MM we
were trying to operate on, it is part of the rando other VM, so the
writers we've blocked are also part of the rando other VM, and so the
rando other VM is where we have to do a wakeup.

> I may be completely misunderstanding here...