linux-kernel - Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful vma_refcount_put() wrt SLAB_TYPESAFE_BY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ceeb423c-ce98-4ef8-9dbb-24522f88de43@lucifer.local>
Date: Thu, 24 Jul 2025 06:23:26 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Vlastimil Babka <vbabka@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Pedro Falcato <pfalcato@...e.de>, Linux-MM <linux-mm@...ck.org>,
        kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] hard-to-hit mm_struct UAF due to insufficiently careful
 vma_refcount_put() wrt SLAB_TYPESAFE_BY_RCU

On Wed, Jul 23, 2025 at 09:52:41PM +0200, Jann Horn wrote:
> On Wed, Jul 23, 2025 at 8:39 PM Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
> > On Wed, Jul 23, 2025 at 08:19:09PM +0200, Jann Horn wrote:
> > > On Wed, Jul 23, 2025 at 8:10 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> > > > On 7/23/25 19:49, Jann Horn wrote:
> > > > > On Wed, Jul 23, 2025 at 7:32 PM Vlastimil Babka <vbabka@...e.cz> wrote:
> > > > >> On 7/23/25 18:26, Jann Horn wrote:
> > > > >> > There's a racy UAF in `vma_refcount_put()` when called on the
> > > > >> > `lock_vma_under_rcu()` path because `SLAB_TYPESAFE_BY_RCU` is used
> > > > >> > without sufficient protection against concurrent object reuse:
> > > > >>
> > > > >> Oof.
> > > > >>
> > > > >> > I'm not sure what the right fix is; I guess one approach would be to
> > > > >> > have a special version of vma_refcount_put() for cases where the VMA
> > > > >> > has been recycled by another MM that grabs an extra reference to the
> > > > >> > MM? But then dropping a reference to the MM afterwards might be a bit
> > > > >> > annoying and might require something like mmdrop_async()...
> > > > >>
> > > > >> Would we need mmdrop_async()? Isn't this the case for mmget_not_zero() and
> > > > >> mmput_async()?
> > > > >
> > > > > Now I'm not sure anymore if either of those approaches would work,
> > > > > because they rely on the task that's removing the VMA to wait until we
> > > > > do __refcount_dec_and_test() before deleting the MM... but I don't
> > > > > think we have any such guarantee...
> > > >
> > > > I think it would be waiting in exit_mmap->vma_mark_detached(), but then
> > > > AFAIU you're right and we'd really need to work with mmgrab/mmdrop because
> > > > at that point the  mmget_not_zero() would already be failing...
> > >
> > > Ah, I see! vma_mark_detached() drops its reference, then does
> > > __vma_enter_locked() to bump the refcount by VMA_LOCK_OFFSET again
> > > (after which the reader path can't acquire it anymore), then waits
> > > until the refcount drops to VMA_LOCK_OFFSET, and then decrements it
> > > down to 0 from there. Makes sense.
> >
> > Sorry, this is really my fault because I didn't closely follow the
> > reimplementation of the VMA locks closely enough and so am a little behind
> > here (I'll fix this, probably by documenting them fully in the relevant doc
> > page).
> >
> > So forgive me if I"m asking stupid questions.
> >
> > What exactly is the issue with the waiter not being triggered?
> >
> > I see in vma_mark_detached():
> >
> >         /*
> >          * We are the only writer, so no need to use vma_refcount_put().
> >          * The condition below is unlikely because the vma has been already
> >          * write-locked and readers can increment vm_refcnt only temporarily
> >          * before they check vm_lock_seq, realize the vma is locked and drop
> >          * back the vm_refcnt. That is a narrow window for observing a raised
> >          * vm_refcnt.
> >          */
> >
> > So, if this is happening at the point of the unmap, and we're unlucky enough to
> > have some readers have spuriously incremented the refcnt before they check
> > vm_lock_seq, we trigger __vma_enter_locked() and wait on other VMAs to
> > vma_refcount_put() to wake it up vai rcuwait_wake_up() if the refcount is still
> > raised (which it should be right?)
>
> I'm not sure if I'm understanding you correctly; but yes,
> __vma_enter_locked() waits for all the waiters to drop their
> "refcounts". (It's not really a refcount, you can also think of it as
> a sleepable read-write lock where the low bits are the number of
> readers.)

Yes I understand this bit.

>
> > So actually are we going to be left with readers sat around waiting forever? If
> > the scenario mentioned happens?
>
> I'm not sure I understand the question. Readers don't wait, they bail
> out if they hit contention and retry with the mmap lock. As far as VMA
> locks are concerned, readers basically always trylock, only writers
> can wait.

No I understand that bit, I'm not putting this clearly. I meant to say
__vma_enter_locked() rcuwait_wait_event()'s until:

refcount_read(&vma->vm_refcnt) == tgt_refcnt

But it won't be the reader will it... it'll be the writer, waiting for any
readers that obtained a spurious refcount to drop it.

>
> > If we make the rando mm we are now referencing stick around, aren't we just
> > spuriously triggering this thing while potentially leaving actual waiters
> > waiting?
>
> In that case, the thing we've read-locked is not part of the MM we
> were trying to operate on, it is part of the rando other VM, so the
> writers we've blocked are also part of the rando other VM, and so the
> rando other VM is where we have to do a wakeup.

I am glad you perpetuated my use of 'rando' :P

And yes this is the crux of what I found confusing as mentioned in other thread,
so it's this rando other mm we need to keep around...