lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50502c3b-903f-4018-b796-4a158f939593@lucifer.local>
Date: Fri, 25 Jul 2025 14:48:56 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>, Rik van Riel <riel@...riel.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Harry Yoo <harry.yoo@...cle.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check

On Fri, Jul 25, 2025 at 02:00:18PM +0200, Jann Horn wrote:
> On Fri, Jul 25, 2025 at 1:32 PM Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
> > On Thu, Jul 24, 2025 at 09:13:50PM +0200, Jann Horn wrote:
> > > If an anon page is mapped into userspace, its anon_vma must be alive,
> > > otherwise rmap walks can hit UAF.
> >
> > This is extremely weird. I guess it's a UAF of vma->anon_vma
> >
> > Because we:
> >
> > put_anon_vma()
> > ->__put_anon_vma()
> > ->anon_vma_free() (also the root anon_vma... interestingly!)
>
> FWIW, the thing we're trying to access can't be the root anon_vma,
> since KASAN says the UAF'd object was allocated from anon_vma_fork.

OK that's interesting, this is data.

I wonder if related to anon_vma reuse somehow...

>
> > But none of this obviously updates the vma to set vma->anon_vma to NULL.
> >
> > So yeah god almighty. To get this, we must be dereffing vma->anon_vma
> > somewhere unexpected.
>
> I don't see how vma->anon_vma is directly relevant. I think the
> relevant things are:

Yeah sorry, I'm reflexively assuming it's a VMA that's the issue here. I'll
dig into the syzkallers when I can to get more insights if there are any to
be had (maybe this patch is the difference on this).

>
> An anon_vma is only kept alive as long as at least one associated VMA
> is still alive.

Well, including child VMA's yes.

> An anon folio may outlive the VMAs it comes from, so it may also
> outlive its associated anon_vma.

Yes, I will share some diagrams I did a while ago to outline this. They're
ASCII and make you want to cry! :)

Hmm, if non-root, I wonder if we

> When an anon_vma goes away, it does not clear the ->mapping of
> associated folios (this is an intentional design choice).

Well it can't without having to do an expensive page table walk, so
obviously by design yes.

> When an rmap traversal wants to go from a folio to the associated
> anon_vma, it (under RCU) checks that the mapcount is non-zero, which
> implies that there must still be associated VMAs, which implies that
> the associated anon_vma must still be alive; and then it dereferences
> the ->mapping.



>
> I think the ASAN splat indicates that syzkaller must fairly often get
> into situations where the ->mapping pointer is dangling despite the
> mapcount being non-zero, but syzkaller likely only actually hits the
> UAF when the kernel gets memory pressure by chance and does rmap walks
> on the reclaim path.

Yup.

>
> So there are several invariants we're relying on here, and breaking
> any one of these could allow an rmap traversal to cause UAF; a
> non-exhaustive list:

This is good, because we can start to really lay out these things. This is
also vital to my anon_vma work.

>
> 1. An anon folio is only mapped into VMAs that are associated with the
> folio's anon_vma.

Correct.

> 2. Removing an anon folio mapping reduces the anon folio's mapcount
> before the VMA can go away.

the anon folio's mapcount? You mean the VMA's? :P

> 3. VMA splitting/merging/forking properly preserves the anon_vma
  association.

I had slides for this for version 1 of my LSF presentation, not sure if I
kept...

But yes it does fundamentally, in a slightly typically overwrought way.

> 4. If the anon-exclusive bit is set, the folio is only mapped in a
> single place (otherwise swapout+swapin could erroneously set
> RMAP_EXCLUSIVE, causing the swapped-in folio to be associated with the
> wrong anon_vma).

I believe (David?) swapin can cause this not to be the case?

> 5. When a VMA is associated with an anon_vma, it is always also
> associated with the corresponding root anon_vma (necessary because
> non-RMAP_EXCLUSIVE swapin falls back to associating the folio with the
> root anon_vma).

OK but we know for sure the UAF is not on a root anon_vma so it's not some
weirdness with trying to access anon_vma->root

> 6. If two VMAs in the same process have the same ->anon_vma, their
> anonvma chains must be the same (otherwise VMA merging can break
> stuff).
>

What do you mean the same?

If you mean they both must have AVC's which ponit to the individual VMAs
and each point to the same anon_vma, yes.

Then we see e.g. after split anon_vma->num_active_vmas being incremented.

So in relation to this, as to assignment of vma->anon_vma or not:

1. Where vma->anon_vma == anon_vma A.

2. Where we fork the process and the new vma->anon_vma allocates a new
   anon_vma to vma->anon_vma B, whose parent, root fields reference
   anon_vma A.

3. We can end up with anon_vma's where no vma->anon_vma points to them, and
   folio->mapping (masking lower bits) == anon_vma, but then the refcount
   should be 1. Also I can't see how you can do that without a
   folio->mapping pointing at it:

Process 1

     |-------------|
     | avc         v
  |-----|   vma |-----|
  |  A  |<------| avc |
  |-----|       |-----|        |--------------------------|
     | anon_vma    ^           |                          |
     |             |--------------------|                 |
     v     rb_root |                    |                 |
  |---------------------|               |                 |
  | refcount = 3        |               |                 |
  | num_children = 2    |<--------------x----|            |
  | num_active_vmas = 1 |               |    |            |
  |---------------------|               |    |            |
                           |------------|    |            |
Process 2                  |                 |            |
                           |                 |            |
     |-------------|       |                 |            |
     | avc         v       v                 |            |
  |-----|   vma |-----| |-----|              |            |
  |  B  |<------| avc |.| avc |              |            |
  |-----|       |-----| |-----|              |            |
     | anon_vma    ^                         |            |
     |             |-------------------------x--------|   |
     v     rb_root |                         |        |   |
  |---------------------|                    |        |   |
  | refcount = 1        |<-------------------x----|   |   |
  | num_children = 1    |--------------------|    |   |   |
  | num_active_vmas = 1 | parent                  |   |   |
  |---------------------|                         |   |   |
                                                  |   |   |
Process 3                                         |   |   |
                           |----------------------x---|   |
     |-------------|       |       |--------------x-------|
     | avc         v       v       v              |
  |-----|   vma |-----| |-----| |-----|           |
  |  C  |<------| avc |.| avc |.| avc |           |
  |-----|       |-----| |-----| |-----|           |
     | anon_vma    ^                              |
     |             |                              |
     v     rb_root |                              |
  |---------------------|                         |
  | refcount = 1        |                         |
  | num_children = 0    |-------------------------|
  | num_active_vmas = 1 | parent
  |---------------------|

UNMAP B

Process 1

     |-------------|
     | avc         v
  |-----|   vma |-----|
  |  A  |<------| avc |
  |-----|       |-----|        |--------------------------|
     | anon_vma    ^           |                          |
     |             |------------                          |
     v     rb_root |                                      |
  |---------------------|                                 |
  | refcount = 3        |                                 |
  | num_children = 2    |<-------------------|            |
  | num_active_vmas = 1 |                    |            |
  |---------------------|                    |            |
                                             |            |
Process 2                                    |            |
                                             |            |
                   |-------------------------x--------|   |
           rb_root |                         |        |   |
  |---------------------|                    |        |   |
  | refcount = 1        |<-------------------x----|   |   | We keep empty
  | num_children = 1    |--------------------|    |   |   | anon_vma round.
  | num_active_vmas = 0 | parent                  |   |   |
  |---------------------|                         |   |   |
                                                  |   |   |
Process 3                                         |   |   |
                           |----------------------x---|   |
     |-------------|       |       |--------------x-------|
     | avc         v       v       v              |
  |-----|   vma |-----| |-----| |-----|           |
  |  C  |<------| avc |.| avc |.| avc |           |
  |-----|       |-----| |-----| |-----|           |
     | anon_vma    ^                              |
     |             |                              |
     v     rb_root |                              |
  |---------------------|                         |
  | refcount = 1        |                         |
  | num_children = 0    |-------------------------|
  | num_active_vmas = 1 | parent
  |---------------------|

4. But they can get reused through the anon_vma reuse logic:

FORK

Process 1

     |-------------|
     | avc         v
  |-----|   vma |-----|             |-----------------------------|
  |  A  |<------| avc |             |                             |
  |-----|       |-----|        |--------------------------|       |
     | anon_vma    ^           |                          |       |
     |             |-----------|                          |       |
     v     rb_root |                                      |       |
  |---------------------|                                 |       |
  | refcount = 3        | (self-parent)                   |       |
  | num_children = 2    |<-------------------|            |       |
  | num_active_vmas = 1 |                    |            |       |
  |---------------------|                    |            |       |
                                             |            |       |
Process 2               |--------------------x------------x-------x-------|
                        |                    |            |       |       |
                   |-------------------------x--------|   |       |       |
           rb_root |                         |        |   |       |       |
  |---------------------|<-------------------x--------x---x-------x---|   |
  | refcount = 1        |<-------------------x----|   |   |       |   |   |
  | num_children = 1    |--------------------|    |   |   |       |   |   |
  | num_active_vmas = 1 | parent                  |   |   |       |   |   |
  |---------------------|                         |   |   |       |   |   |
                                                  |   |   |       |   |   |
Process 3                                         |   |   |       |   |   |
                           |----------------------x---|   |       |   |   |
     |-------------|       |       |--------------x-------|       |   |   |
     | avc         v       v       v              |               |   |   |
  |-----|   vma |-----| |-----| |-----|           |               |   |   |
  |  C  |<------| avc |.| avc |.| avc |           |               |   |   |
  |-----|       |-----| |-----| |-----|           |               |   |   |
     | anon_vma    ^                              |               |   |   |
     |             |------------------------------x-----------|   |   |   |
     v     rb_root |                              |           |   |   |   |
  |---------------------|                         |           |   |   |   |
  | refcount = 1        |                         |           |   |   |   |
  | num_children = 0    |-------------------------|           |   |   |   |
  | num_active_vmas = 1 | parent                              |   |   |   |
  |---------------------|                                     |   |   |   |
                                                              |   |   |   |
Process 4                                                     |   |   |   |
                           |----------------------------------|   |   |   |
     |-------------|       |       |------------------------------|   |   |
     | avc         v       v       v                                  |   |
  |-----|   vma |-----| |-----| |-----|                               |   |
  |  D  |<------| avc |.| avc |.| avc |                               |   |
  |-----|       |-----| |-----| |-----|                               |   |
     | anon_vma    ^                                                  |   |
     |             |--------------------------------------------------x---|
     |                                                                |
     |----------------------------------------------------------------|

God simple isn't it? ;)

I verified these numbers with drgn, interesting add a new child doesn't
increment refcount...

> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>
> Thanks!

No problem, thanks for doing this!

>
> > > ---
> > >  include/linux/rmap.h | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > >
> > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > > index c4f4903b1088..ba694c436f59 100644
> > > --- a/include/linux/rmap.h
> > > +++ b/include/linux/rmap.h
> > > @@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
> > >       default:
> > >               VM_WARN_ON_ONCE(true);
> > >       }
> > > +
> > > +     /*
> > > +      * Anon folios must have an associated live anon_vma as long as they're
> > > +      * mapped into userspace.
> > > +      * Part of the purpose of the atomic_read() is to make KASAN check that
> > > +      * the anon_vma is still alive.
> > > +      */
> > > +     if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
> > > +             unsigned long mapping = (unsigned long)folio->mapping;
> > > +             struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
> > > +
> > > +             VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
> >
> > Maybe stupid question, but wouldn't we pick this up with KASAN? Or would we
> > pick it up far too late I guess?
>
> I mean, it depends on the exact nature of the issue. If the issue is
> that we somehow end up with anon folios mapped into VMAs that are not
> associated with the anon folio's anon_vma, then I think the only time
> we'd notice would be if we randomly end up doing rmap walks of the
> folios, as syzkaller did above.

Yup. This will trigger on __folio_remove_rmap() and __folio_add_rmap():

do_swap_page() / remove_migration_pte() / remove_migration_pmd()
-> folio_add_anon_rmap_ptes() / folio_add_anon_rmap_pmd()
-> __folio_add_anon_rmap()
-> __folio_add_rmap()


vmf_insert_folio_pud() / set_pte_range() / insert_page_into_pte_locked() / mfill_atomic_install_pte() / vmf_insert_folio_pmd() / remove_migration_pmd() / remove_migration_pte()
-> folio_add_file_rmap_ptes() / folio_add_file_rmap_pmd() / folio_add_file_rmap_pud()
-> __folio_add_file_rmap()
-> __folio_add_rmap()

I guess we aren't interested in folio_add_new_anon_rmap() as we won't have
an existing anon_vma there

>
> > We're sort of relying on this
> >
> > a. being a UAF
> >
> > b. the thing we're UAF-ing not either corrupting this field or (if that
> >     memory is actually reused as an anon_vma - I'm not familiar with slab
> >     caches - so maybe it's quite likely - getting its refcount incremented.
>
> KASAN sees the memory read I'm doing with this atomic_read(), so in
> KASAN builds, if this is a UAF, it should trigger a KASAN splat
> (modulo KASAN limitations around when UAF can be detected). Basically,
> in KASAN builds, the actual explicit check I'm doing here is only
> relevant if the object has not yet been freed. That's why I wrote the
> comment "Part of the purpose of the atomic_read() is to make KASAN
> check that the anon_vma is still alive.".

Hm, I'm confused, how can you detect a UAF if the object cannot yet be
freed? :P

or would that be the case were it not an atomic_read()?

I guess this permits this to be detected in a timely manner.

>
> > Which is fine - because hey this is the only way we can get a hint that
> > this is happening, but I think we should describe what we're doing.
>
> Sure, I can make the comment more verbose.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ