linux-kernel - Re: [PATCH] mm/rmap: Add anon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3cfc1146-1b62-4b04-a2e5-997d10ba4124@lucifer.local>
Date: Fri, 25 Jul 2025 16:07:00 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>, Rik van Riel <riel@...riel.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Harry Yoo <harry.yoo@...cle.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check

On Fri, Jul 25, 2025 at 04:48:09PM +0200, Jann Horn wrote:
> On Fri, Jul 25, 2025 at 3:49 PM Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
> > On Fri, Jul 25, 2025 at 02:00:18PM +0200, Jann Horn wrote:
> > > An anon folio may outlive the VMAs it comes from, so it may also
> > > outlive its associated anon_vma.
> >
> > Yes, I will share some diagrams I did a while ago to outline this. They're
> > ASCII and make you want to cry! :)
> >
> > Hmm, if non-root, I wonder if we
>
> (looks like you stopped typing mid-sentence)

Yup,

>
> > > 2. Removing an anon folio mapping reduces the anon folio's mapcount
> > > before the VMA can go away.
> >
> > the anon folio's mapcount? You mean the VMA's? :P
>
> I mean folio_mapcount(folio), which reads folio->_mapcount and
> folio->_large_mapcount.

Yup sorry, brain fart.

>
> > > 4. If the anon-exclusive bit is set, the folio is only mapped in a
> > > single place (otherwise swapout+swapin could erroneously set
> > > RMAP_EXCLUSIVE, causing the swapped-in folio to be associated with the
> > > wrong anon_vma).
> >
> > I believe (David?) swapin can cause this not to be the case?
> >
> > > 5. When a VMA is associated with an anon_vma, it is always also
> > > associated with the corresponding root anon_vma (necessary because
> > > non-RMAP_EXCLUSIVE swapin falls back to associating the folio with the
> > > root anon_vma).
> >
> > OK but we know for sure the UAF is not on a root anon_vma so it's not some
> > weirdness with trying to access anon_vma->root
>
> Ah, right.
>
> > > 6. If two VMAs in the same process have the same ->anon_vma, their
> > > anonvma chains must be the same (otherwise VMA merging can break
> > > stuff).
> > >
> >
> > What do you mean the same?
> >
> > If you mean they both must have AVC's which ponit to the individual VMAs
> > and each point to the same anon_vma, yes.
>
> Yeah, that.
>
> > God simple isn't it? ;)
>
> Yeah, I prefer to think of this at the slightly higher abstraction
> layer of "which VMAs are tied to which anon_vmas via AVC" and "which
> VMAs use which anon_vmas as their primary anon_vma"; to me, AVCs being
> separate objects is a minor implementation detail caused by the kernel
> only using intrusive lists instead of the kinds of data structures
> that you'd use in almost any other environment.

To me (a simple man :) there's a many to many relationship so it makes
sense _on one level_ to have connecting AVC's.

I however feel like we can probably do better on the data structures here.

I alos think:

- The reuse logic can be improved
- The semantics or at least logic around logic can be improved


> (Like, you wouldn't need AVC objects if the references between VMAs
> and anon_vmas were formed with things like maple trees or xarrays, but
> I guess they wouldn't give you the interval tree semantics you want.)

Well, we'll see about that :)

>
> > I verified these numbers with drgn, interesting add a new child doesn't
> > increment refcount...
>
> Yeah - AFAIU a single reference is shared by all the VMAs that are
> tied to an anon_vma via AVC nodes, and a child anon_vma can't be
> associated with a VMA without its parent also being associated with
> the VMA...

You see this is also some of the complexity I dislike.

I think when you constrain yourself to the design currently existing, you
sort of necessarily need to add these things.

But a better thought out design might avoid them.

>
> > > > We're sort of relying on this
> > > >
> > > > a. being a UAF
> > > >
> > > > b. the thing we're UAF-ing not either corrupting this field or (if that
> > > >     memory is actually reused as an anon_vma - I'm not familiar with slab
> > > >     caches - so maybe it's quite likely - getting its refcount incremented.
> > >
> > > KASAN sees the memory read I'm doing with this atomic_read(), so in
> > > KASAN builds, if this is a UAF, it should trigger a KASAN splat
> > > (modulo KASAN limitations around when UAF can be detected). Basically,
> > > in KASAN builds, the actual explicit check I'm doing here is only
> > > relevant if the object has not yet been freed. That's why I wrote the
> > > comment "Part of the purpose of the atomic_read() is to make KASAN
> > > check that the anon_vma is still alive.".
> >
> > Hm, I'm confused, how can you detect a UAF if the object cannot yet be
> > freed? :P
> >
> > or would that be the case were it not an atomic_read()?
> >
> > I guess this permits this to be detected in a timely manner.
>
> If the anon_vma hasn't yet been freed, but its refcount is 0, then
> that's still a bug because we rely on the anon_vma to have a nonzero
> refcount as long as there are folios with a nonzero mapcount that are
> tied to it, and it is likely to allow UAF at a later point.

But how is this happening?

The only places where we might explicitly manipulate anon_vma->refcount
are:

- anon_vma_ctor() -> set to 0 on construction used by slab.
- folio_lock_anon_vma_read() / put_anon_vma() - both cases call
  __put_anon_vma() when 0 to free the anon_vma.

So how could we get to a refcount of 0 but the anon_vma still be hanging
around, except if it's freshly allocated?

It's surely only UAF?