[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50502c3b-903f-4018-b796-4a158f939593@lucifer.local>
Date: Fri, 25 Jul 2025 14:48:56 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>, Rik van Riel <riel@...riel.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Harry Yoo <harry.yoo@...cle.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
On Fri, Jul 25, 2025 at 02:00:18PM +0200, Jann Horn wrote:
> On Fri, Jul 25, 2025 at 1:32 PM Lorenzo Stoakes
> <lorenzo.stoakes@...cle.com> wrote:
> > On Thu, Jul 24, 2025 at 09:13:50PM +0200, Jann Horn wrote:
> > > If an anon page is mapped into userspace, its anon_vma must be alive,
> > > otherwise rmap walks can hit UAF.
> >
> > This is extremely weird. I guess it's a UAF of vma->anon_vma
> >
> > Because we:
> >
> > put_anon_vma()
> > ->__put_anon_vma()
> > ->anon_vma_free() (also the root anon_vma... interestingly!)
>
> FWIW, the thing we're trying to access can't be the root anon_vma,
> since KASAN says the UAF'd object was allocated from anon_vma_fork.
OK that's interesting, this is data.
I wonder if related to anon_vma reuse somehow...
>
> > But none of this obviously updates the vma to set vma->anon_vma to NULL.
> >
> > So yeah god almighty. To get this, we must be dereffing vma->anon_vma
> > somewhere unexpected.
>
> I don't see how vma->anon_vma is directly relevant. I think the
> relevant things are:
Yeah sorry, I'm reflexively assuming it's a VMA that's the issue here. I'll
dig into the syzkallers when I can to get more insights if there are any to
be had (maybe this patch is the difference on this).
>
> An anon_vma is only kept alive as long as at least one associated VMA
> is still alive.
Well, including child VMA's yes.
> An anon folio may outlive the VMAs it comes from, so it may also
> outlive its associated anon_vma.
Yes, I will share some diagrams I did a while ago to outline this. They're
ASCII and make you want to cry! :)
Hmm, if non-root, I wonder if we
> When an anon_vma goes away, it does not clear the ->mapping of
> associated folios (this is an intentional design choice).
Well it can't without having to do an expensive page table walk, so
obviously by design yes.
> When an rmap traversal wants to go from a folio to the associated
> anon_vma, it (under RCU) checks that the mapcount is non-zero, which
> implies that there must still be associated VMAs, which implies that
> the associated anon_vma must still be alive; and then it dereferences
> the ->mapping.
>
> I think the ASAN splat indicates that syzkaller must fairly often get
> into situations where the ->mapping pointer is dangling despite the
> mapcount being non-zero, but syzkaller likely only actually hits the
> UAF when the kernel gets memory pressure by chance and does rmap walks
> on the reclaim path.
Yup.
>
> So there are several invariants we're relying on here, and breaking
> any one of these could allow an rmap traversal to cause UAF; a
> non-exhaustive list:
This is good, because we can start to really lay out these things. This is
also vital to my anon_vma work.
>
> 1. An anon folio is only mapped into VMAs that are associated with the
> folio's anon_vma.
Correct.
> 2. Removing an anon folio mapping reduces the anon folio's mapcount
> before the VMA can go away.
the anon folio's mapcount? You mean the VMA's? :P
> 3. VMA splitting/merging/forking properly preserves the anon_vma
association.
I had slides for this for version 1 of my LSF presentation, not sure if I
kept...
But yes it does fundamentally, in a slightly typically overwrought way.
> 4. If the anon-exclusive bit is set, the folio is only mapped in a
> single place (otherwise swapout+swapin could erroneously set
> RMAP_EXCLUSIVE, causing the swapped-in folio to be associated with the
> wrong anon_vma).
I believe (David?) swapin can cause this not to be the case?
> 5. When a VMA is associated with an anon_vma, it is always also
> associated with the corresponding root anon_vma (necessary because
> non-RMAP_EXCLUSIVE swapin falls back to associating the folio with the
> root anon_vma).
OK but we know for sure the UAF is not on a root anon_vma so it's not some
weirdness with trying to access anon_vma->root
> 6. If two VMAs in the same process have the same ->anon_vma, their
> anonvma chains must be the same (otherwise VMA merging can break
> stuff).
>
What do you mean the same?
If you mean they both must have AVC's which ponit to the individual VMAs
and each point to the same anon_vma, yes.
Then we see e.g. after split anon_vma->num_active_vmas being incremented.
So in relation to this, as to assignment of vma->anon_vma or not:
1. Where vma->anon_vma == anon_vma A.
2. Where we fork the process and the new vma->anon_vma allocates a new
anon_vma to vma->anon_vma B, whose parent, root fields reference
anon_vma A.
3. We can end up with anon_vma's where no vma->anon_vma points to them, and
folio->mapping (masking lower bits) == anon_vma, but then the refcount
should be 1. Also I can't see how you can do that without a
folio->mapping pointing at it:
Process 1
|-------------|
| avc v
|-----| vma |-----|
| A |<------| avc |
|-----| |-----| |--------------------------|
| anon_vma ^ | |
| |--------------------| |
v rb_root | | |
|---------------------| | |
| refcount = 3 | | |
| num_children = 2 |<--------------x----| |
| num_active_vmas = 1 | | | |
|---------------------| | | |
|------------| | |
Process 2 | | |
| | |
|-------------| | | |
| avc v v | |
|-----| vma |-----| |-----| | |
| B |<------| avc |.| avc | | |
|-----| |-----| |-----| | |
| anon_vma ^ | |
| |-------------------------x--------| |
v rb_root | | | |
|---------------------| | | |
| refcount = 1 |<-------------------x----| | |
| num_children = 1 |--------------------| | | |
| num_active_vmas = 1 | parent | | |
|---------------------| | | |
| | |
Process 3 | | |
|----------------------x---| |
|-------------| | |--------------x-------|
| avc v v v |
|-----| vma |-----| |-----| |-----| |
| C |<------| avc |.| avc |.| avc | |
|-----| |-----| |-----| |-----| |
| anon_vma ^ |
| | |
v rb_root | |
|---------------------| |
| refcount = 1 | |
| num_children = 0 |-------------------------|
| num_active_vmas = 1 | parent
|---------------------|
UNMAP B
Process 1
|-------------|
| avc v
|-----| vma |-----|
| A |<------| avc |
|-----| |-----| |--------------------------|
| anon_vma ^ | |
| |------------ |
v rb_root | |
|---------------------| |
| refcount = 3 | |
| num_children = 2 |<-------------------| |
| num_active_vmas = 1 | | |
|---------------------| | |
| |
Process 2 | |
| |
|-------------------------x--------| |
rb_root | | | |
|---------------------| | | |
| refcount = 1 |<-------------------x----| | | We keep empty
| num_children = 1 |--------------------| | | | anon_vma round.
| num_active_vmas = 0 | parent | | |
|---------------------| | | |
| | |
Process 3 | | |
|----------------------x---| |
|-------------| | |--------------x-------|
| avc v v v |
|-----| vma |-----| |-----| |-----| |
| C |<------| avc |.| avc |.| avc | |
|-----| |-----| |-----| |-----| |
| anon_vma ^ |
| | |
v rb_root | |
|---------------------| |
| refcount = 1 | |
| num_children = 0 |-------------------------|
| num_active_vmas = 1 | parent
|---------------------|
4. But they can get reused through the anon_vma reuse logic:
FORK
Process 1
|-------------|
| avc v
|-----| vma |-----| |-----------------------------|
| A |<------| avc | | |
|-----| |-----| |--------------------------| |
| anon_vma ^ | | |
| |-----------| | |
v rb_root | | |
|---------------------| | |
| refcount = 3 | (self-parent) | |
| num_children = 2 |<-------------------| | |
| num_active_vmas = 1 | | | |
|---------------------| | | |
| | |
Process 2 |--------------------x------------x-------x-------|
| | | | |
|-------------------------x--------| | | |
rb_root | | | | | |
|---------------------|<-------------------x--------x---x-------x---| |
| refcount = 1 |<-------------------x----| | | | | |
| num_children = 1 |--------------------| | | | | | |
| num_active_vmas = 1 | parent | | | | | |
|---------------------| | | | | | |
| | | | | |
Process 3 | | | | | |
|----------------------x---| | | | |
|-------------| | |--------------x-------| | | |
| avc v v v | | | |
|-----| vma |-----| |-----| |-----| | | | |
| C |<------| avc |.| avc |.| avc | | | | |
|-----| |-----| |-----| |-----| | | | |
| anon_vma ^ | | | |
| |------------------------------x-----------| | | |
v rb_root | | | | | |
|---------------------| | | | | |
| refcount = 1 | | | | | |
| num_children = 0 |-------------------------| | | | |
| num_active_vmas = 1 | parent | | | |
|---------------------| | | | |
| | | |
Process 4 | | | |
|----------------------------------| | | |
|-------------| | |------------------------------| | |
| avc v v v | |
|-----| vma |-----| |-----| |-----| | |
| D |<------| avc |.| avc |.| avc | | |
|-----| |-----| |-----| |-----| | |
| anon_vma ^ | |
| |--------------------------------------------------x---|
| |
|----------------------------------------------------------------|
God simple isn't it? ;)
I verified these numbers with drgn, interesting add a new child doesn't
increment refcount...
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>
> Thanks!
No problem, thanks for doing this!
>
> > > ---
> > > include/linux/rmap.h | 13 +++++++++++++
> > > 1 file changed, 13 insertions(+)
> > >
> > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > > index c4f4903b1088..ba694c436f59 100644
> > > --- a/include/linux/rmap.h
> > > +++ b/include/linux/rmap.h
> > > @@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
> > > default:
> > > VM_WARN_ON_ONCE(true);
> > > }
> > > +
> > > + /*
> > > + * Anon folios must have an associated live anon_vma as long as they're
> > > + * mapped into userspace.
> > > + * Part of the purpose of the atomic_read() is to make KASAN check that
> > > + * the anon_vma is still alive.
> > > + */
> > > + if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
> > > + unsigned long mapping = (unsigned long)folio->mapping;
> > > + struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
> > > +
> > > + VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
> >
> > Maybe stupid question, but wouldn't we pick this up with KASAN? Or would we
> > pick it up far too late I guess?
>
> I mean, it depends on the exact nature of the issue. If the issue is
> that we somehow end up with anon folios mapped into VMAs that are not
> associated with the anon folio's anon_vma, then I think the only time
> we'd notice would be if we randomly end up doing rmap walks of the
> folios, as syzkaller did above.
Yup. This will trigger on __folio_remove_rmap() and __folio_add_rmap():
do_swap_page() / remove_migration_pte() / remove_migration_pmd()
-> folio_add_anon_rmap_ptes() / folio_add_anon_rmap_pmd()
-> __folio_add_anon_rmap()
-> __folio_add_rmap()
vmf_insert_folio_pud() / set_pte_range() / insert_page_into_pte_locked() / mfill_atomic_install_pte() / vmf_insert_folio_pmd() / remove_migration_pmd() / remove_migration_pte()
-> folio_add_file_rmap_ptes() / folio_add_file_rmap_pmd() / folio_add_file_rmap_pud()
-> __folio_add_file_rmap()
-> __folio_add_rmap()
I guess we aren't interested in folio_add_new_anon_rmap() as we won't have
an existing anon_vma there
>
> > We're sort of relying on this
> >
> > a. being a UAF
> >
> > b. the thing we're UAF-ing not either corrupting this field or (if that
> > memory is actually reused as an anon_vma - I'm not familiar with slab
> > caches - so maybe it's quite likely - getting its refcount incremented.
>
> KASAN sees the memory read I'm doing with this atomic_read(), so in
> KASAN builds, if this is a UAF, it should trigger a KASAN splat
> (modulo KASAN limitations around when UAF can be detected). Basically,
> in KASAN builds, the actual explicit check I'm doing here is only
> relevant if the object has not yet been freed. That's why I wrote the
> comment "Part of the purpose of the atomic_read() is to make KASAN
> check that the anon_vma is still alive.".
Hm, I'm confused, how can you detect a UAF if the object cannot yet be
freed? :P
or would that be the case were it not an atomic_read()?
I guess this permits this to be detected in a timely manner.
>
> > Which is fine - because hey this is the only way we can get a hint that
> > this is happening, but I think we should describe what we're doing.
>
> Sure, I can make the comment more verbose.
Thanks!
Powered by blists - more mailing lists