linux-kernel - Re: [syzbot] [kernel?] KASAN: slab-use-after-free Write in flush_tlb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG48ez2_4D17XMrEb7+5fwq0RFDFDCsY5OjTB7uaXEzdybxshA@mail.gmail.com>
Date: Wed, 2 Jul 2025 18:53:04 +0200
From: Jann Horn <jannh@...gle.com>
To: Rik van Riel <riel@...riel.com>
Cc: syzbot <syzbot+084b6e5bc1016723a9c4@...kaller.appspotmail.com>, bp@...en8.de, 
	dave.hansen@...ux.intel.com, hpa@...or.com, linux-kernel@...r.kernel.org, 
	luto@...nel.org, mingo@...hat.com, neeraj.upadhyay@...nel.org, 
	paulmck@...nel.org, peterz@...radead.org, syzkaller-bugs@...glegroups.com, 
	tglx@...utronix.de, x86@...nel.org, yury.norov@...il.com, 
	kernel-team <kernel-team@...a.com>, David Hildenbrand <david@...hat.com>
Subject: Re: [syzbot] [kernel?] KASAN: slab-use-after-free Write in flush_tlb_func

On Wed, Jul 2, 2025 at 5:24 PM Rik van Riel <riel@...riel.com> wrote:
>
> On Wed, 2025-07-02 at 06:50 -0700, syzbot wrote:
> >
> > The issue was bisected to:
> >
> > commit a12a498a9738db65152203467820bb15b6102bd2
> > Author: Yury Norov [NVIDIA] <yury.norov@...il.com>
> > Date:   Mon Jun 23 00:00:08 2025 +0000
> >
> >     smp: Don't wait for remote work done if not needed in
> > smp_call_function_many_cond()
>
> While that change looks like it would increase the
> likelihood of hitting this issue, it does not look
> like the root cause.
>
> Instead, the stack traces below show that the
> TLB flush code is being asked to flush the TLB
> for an mm that is exiting.
>
> One CPU is running the TLB flush handler, while
> another CPU is freeing the mm_struct.
>
> The CPU that sent the simultaneous TLB flush
> is not visible in the stack traces below,
> but we seem to have various places around the
> MM where we flush the TLB for another mm,
> without taking any measures to protect against
> that mm being freed while the flush is ongoing.

TLB flushes via IPIs on x86 are always synchronous, right?
flush_tlb_func is only referenced from native_flush_tlb_multi() in
calls to on_each_cpu_mask() (with wait=true) or
on_each_cpu_cond_mask() (with wait=1).
So I think this is not an issue, unless you're claiming that we call
native_flush_tlb_multi() with an already-freed info->mm?

And I think the bisected commit really is the buggy one: It looks at
"nr_cpus", which tracks *how many CPUs we have to IPI*, but assumes
that "nr_cpus" tracks *how many CPUs we posted work to*. Those numbers
are not the same: If we post work to a CPU that already had IPI work
pending, we just add a list entry without sending another IPI.