linux-kernel - Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Z24oSCssMnYwtCkGHpCKTOX2J4x+8HNoGovVxeZ5_TzQ@mail.gmail.com>
Date: Wed, 14 Jan 2026 18:05:45 +0100
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Jann Horn <jannh@...gle.com>
Cc: syzbot <syzbot+f5d897f5194d92aa1769@...kaller.appspotmail.com>, 
	Liam.Howlett@...cle.com, akpm@...ux-foundation.org, david@...nel.org, 
	harry.yoo@...cle.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	lorenzo.stoakes@...cle.com, riel@...riel.com, syzkaller-bugs@...glegroups.com, 
	vbabka@...e.cz
Subject: Re: [syzbot] [mm?] KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare

On Wed, 14 Jan 2026 at 18:00, Jann Horn <jannh@...gle.com> wrote:
>
> On Wed, Jan 14, 2026 at 5:43 PM Dmitry Vyukov <dvyukov@...gle.com> wrote:
> > On Wed, 14 Jan 2026 at 17:32, syzbot
> > <syzbot+f5d897f5194d92aa1769@...kaller.appspotmail.com> wrote:
> > > ==================================================================
> > > BUG: KCSAN: data-race in __anon_vma_prepare / __vmf_anon_prepare
> > >
> > > write to 0xffff88811c751e80 of 8 bytes by task 13471 on cpu 1:
> > >  __anon_vma_prepare+0x172/0x2f0 mm/rmap.c:212
> > >  __vmf_anon_prepare+0x91/0x100 mm/memory.c:3673
> > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> [...]
> > > read to 0xffff88811c751e80 of 8 bytes by task 13473 on cpu 0:
> > >  __vmf_anon_prepare+0x26/0x100 mm/memory.c:3667
> > >  hugetlb_no_page+0x1c4/0x10d0 mm/hugetlb.c:5782
> > >  hugetlb_fault+0x4cf/0xce0 mm/hugetlb.c:-1
> > >  handle_mm_fault+0x1894/0x2c60 mm/memory.c:6578
> [...]
> > >
> > > value changed: 0x0000000000000000 -> 0xffff888104ecca28
> > >
> > > Reported by Kernel Concurrency Sanitizer on:
> > > CPU: 0 UID: 0 PID: 13473 Comm: syz.2.3219 Tainted: G        W           syzkaller #0 PREEMPT(voluntary)
> > > Tainted: [W]=WARN
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
> > > ==================================================================
> >
> > Hi Harry,
> >
> > I see you've been debugging:
> > KASAN: slab-use-after-free Read in folio_remove_rmap_ptes
> > https://lore.kernel.org/all/694e3dc6.050a0220.35954c.0066.GAE@google.com/T/
> >
> > Can that bug be caused by this data race?
> > Below is an explanation by Gemini LLM as to why this race is harmful.
> > Obviously take it with a grain of salt, but with my limited mm
> > knowledge it does not look immediately wrong (re rmap invariant).
> >
> > However, now digging into details I see that this Lorenzo's patch
> > also marked as fixing "KASAN: slab-use-after-free Read in
> > folio_remove_rmap_ptes":
> >
> > mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
> > https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7eadf5b.1767638272.git.lorenzo.stoakes@oracle.com/T/
> >
> > So perhaps the race is still benign (or points to another issue?)
> >
> > Here is what LLM said about the race:
> > -----
> >
> > The bug report is actionable and points to a harmful data race in the Linux
> > kernel's memory management subsystem, specifically in the handling of
> > anonymous `hugetlb` mappings.
>
> This data race is not specific to hugetlb at all, and it isn't caused
> by any recent changes. It's a longstanding thing in core MM, but it's
> pretty benign as far as I know.
>
> Fundamentally, the field vma->anon_vma can be read while only holding
> the mmap lock in read mode; and it can concurrently be changed from
> NULL to non-NULL.
>
> One scenario to cause such a data race is to create a new anonymous
> VMA, then trigger two concurrent page faults inside this VMA. Assume a
> configuration with VMA locking disabled for simplicity, so that both
> faults happen under the mmap lock in read mode. This will lead to two
> concurrent calls to __vmf_anon_prepare()
> (https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623),
> both threads only holding the mmap_lock in read mode.
> __vmf_anon_prepare() is essentially this (from
> https://elixir.bootlin.com/linux/v6.18.5/source/mm/memory.c#L3623,
> with VMA locking code removed):
>
> vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
> {
>         struct vm_area_struct *vma = vmf->vma;
>         vm_fault_t ret = 0;
>
>         if (likely(vma->anon_vma))
>                 return 0;
>         [...]
>         if (__anon_vma_prepare(vma))
>                 ret = VM_FAULT_OOM;
>         [...]
>         return ret;
> }
>
> int __anon_vma_prepare(struct vm_area_struct *vma)
> {
>         struct mm_struct *mm = vma->vm_mm;
>         struct anon_vma *anon_vma, *allocated;
>         struct anon_vma_chain *avc;
>
>         [...]
>
>         [... allocate stuff ...]
>
>         anon_vma_lock_write(anon_vma);
>         /* page_table_lock to protect against threads */
>         spin_lock(&mm->page_table_lock);
>         if (likely(!vma->anon_vma)) {
>                 vma->anon_vma = anon_vma;
>                 [...]
>         }
>         spin_unlock(&mm->page_table_lock);
>         anon_vma_unlock_write(anon_vma);
>
>         [... cleanup ...]
>
>         return 0;
>
>         [... error handling ...]
> }
>
> So if one thread reaches the "vma->anon_vma = anon_vma" assignment
> while the other thread is running the "if (likely(vma->anon_vma))"
> check, you get a (AFAIK benign) data race.

Thanks for checking, Jann.

To double check"

"vma->anon_vma = anon_vma" is done w/o store-release, so the lockless
readers can't read anon_vma contents, is it correct? So none of them
really reading anon_vma, right?

Also, anon_vma_chain_link and num_active_vmas++ indeed happen after
assignment to anon_vma:

    /* page_table_lock to protect against threads */
    spin_lock(&mm->page_table_lock);
    if (likely(!vma->anon_vma)) {
        vma->anon_vma = anon_vma;
        anon_vma_chain_link(vma, avc, anon_vma);
        anon_vma->num_active_vmas++;
        allocated = NULL;
        avc = NULL;
    }
    spin_unlock(&mm->page_table_lock);

So the lockless readers that observe anon_vma!=NULL won't rely on
these invariants, right?