[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aGVwAtUi8eKNT8Jy@yury>
Date: Wed, 2 Jul 2025 13:44:34 -0400
From: Yury Norov <yury.norov@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Jann Horn <jannh@...gle.com>, Rik van Riel <riel@...riel.com>,
syzbot <syzbot+084b6e5bc1016723a9c4@...kaller.appspotmail.com>,
bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com,
linux-kernel@...r.kernel.org, luto@...nel.org, mingo@...hat.com,
neeraj.upadhyay@...nel.org, paulmck@...nel.org,
peterz@...radead.org, syzkaller-bugs@...glegroups.com,
x86@...nel.org, kernel-team <kernel-team@...a.com>,
David Hildenbrand <david@...hat.com>
Subject: Re: [syzbot] [kernel?] KASAN: slab-use-after-free Write in
flush_tlb_func
On Wed, Jul 02, 2025 at 07:12:31PM +0200, Thomas Gleixner wrote:
> On Wed, Jul 02 2025 at 19:00, Jann Horn wrote:
> > On Wed, Jul 2, 2025 at 6:53 PM Jann Horn <jannh@...gle.com> wrote:
> >> TLB flushes via IPIs on x86 are always synchronous, right?
> >> flush_tlb_func is only referenced from native_flush_tlb_multi() in
> >> calls to on_each_cpu_mask() (with wait=true) or
> >> on_each_cpu_cond_mask() (with wait=1).
> >> So I think this is not an issue, unless you're claiming that we call
> >> native_flush_tlb_multi() with an already-freed info->mm?
> >>
> >> And I think the bisected commit really is the buggy one: It looks at
> >> "nr_cpus", which tracks *how many CPUs we have to IPI*, but assumes
> >> that "nr_cpus" tracks *how many CPUs we posted work to*. Those numbers
> >> are not the same: If we post work to a CPU that already had IPI work
> >> pending, we just add a list entry without sending another IPI.
> >
> > Or in other words: After that blamed commit, if CPU 1 posts a TLB
> > flush to CPU 3, and then CPU 2 also quickly posts a TLB flush to CPU
> > 3, then CPU 2 will erroneously not wait for the TLB flush to complete
> > before reporting flush completion, which AFAICS means we can get both
> > stale TLB entries and (less often) UAF.
>
> Right you are. Well analyzed and I missed it when taking the lot.
>
> > I think the correct version of that commit would be to revert that
> > commit and instead just move the "run_remote = true;" line down, below
> > the cond_func() check.
>
> I remove it from the relevant tip branch
Thank you guys for explaining that and sorry for the buggy patch.
I was actually under impression that run_remote duplicates nr_cpus !=0,
and even have a patch that removes run_remote.
Maybe worth to add a comment on what run_remote and nr_cpus track?
Thanks,
Yury
Powered by blists - more mailing lists