[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAD=FV=U6sM71UuPbYZRWV87=p1ZO8-gpv3yzK8eMEv3dRNVgdA@mail.gmail.com>
Date: Tue, 27 Jan 2026 13:37:28 -0800
From: Doug Anderson <dianders@...omium.org>
To: Qiliang Yuan <realwujing@...il.com>
Cc: akpm@...ux-foundation.org, lihuafei1@...wei.com,
linux-kernel@...r.kernel.org, mingo@...nel.org, mm-commits@...r.kernel.org,
song@...nel.org, stable@...r.kernel.org, sunshx@...natelecom.cn,
thorsten.blum@...ux.dev, wangjinchao600@...il.com, yangyicong@...ilicon.com,
yuanql9@...natelecom.cn, zhangjn11@...natelecom.cn
Subject: Re: [PATCH v4] watchdog/hardlockup: Fix UAF in perf event cleanup due
to migration race
Hi,
On Mon, Jan 26, 2026 at 6:17 PM Qiliang Yuan <realwujing@...il.com> wrote:
>
> Hi Doug,
>
> Thanks for your insightful follow-up! It's great to have the openEuler vs. Mainline
> timing differences clarified—it definitely explains why we hit this so reliably
> in our downstream environment.
>
> On Mon, Jan 26, 2026 at 5:14 PM Doug Anderson <dianders@...omium.org> wrote:
> > OK, so I think the answer is: you haven't actually seen the problem
> > (or the WARN_ON) on a mainline kernel, only on the openEuler 4.19
> > kernel...
> >
> > ...actually, I looked and now think the problem doesn't exist on a
> > mainline kernel. Specificaly, when we run lockup_detector_retry_init()
> > we call schedule_work() to do the work. That schedules work on the
> > "system_percpu_wq". While the work ends up being queued with
> > "WORK_CPU_UNBOUND", I believe that we still end up running on a thread
> > that's bound to just one CPU in the end. This is presumably why
> > nobody has reported that "WARN_ON(!is_percpu_thread())" actually
> > hitting on mainline.
>
> You are right that in the latest mainline, schedule_work() has been updated
> to use 'system_percpu_wq'. However, in many LTS kernels (including 4.19),
> schedule_work() still submits to 'system_wq', which lacks the per-cpu
> guarantee.
Really, it matters what schedule_work() does on anyone who happens to
have commit 930d8f8dbab9 ("watchdog/perf: adapt the watchdog_perf
interface for async model")... While I can sympathize with supporting
older kernels and doing backports, we have to focus on supporting the
mainline kernel here. If we're claiming that we're fixing a bug (and
even your newest CL says it's fixing a UAF and has a Fixes tag) then
the bug has to actually be there.
> More importantly, even on 'system_percpu_wq', the worker threads do not
> carry the PF_PERCPU_THREAD flag. is_percpu_thread() specifically checks
> (current->flags & PF_PERCPU_THREAD), which is reserved for kthreads
> specifically pinned via kthread_create_on_cpu().
I think we need to keep the focus on mainline or at least the kernel
as of commit 930d8f8dbab9. The grep for "PF_PERCPU_THREAD" has no hits
in either. In both cases, it is:
return (current->flags & PF_NO_SETAFFINITY) &&
(current->nr_cpus_allowed == 1);
> Therefore, the
> WARN_ON(!is_percpu_thread()) in hardlockup_detector_event_create() is
> still violated in the retry path even on mainline.
To ask directly: have you seen this WARN_ON in mainline, or is this
all speculative?
I'm going to assert that the WARN_ON is _not_ seen on mainline and
wasn't there as of commit 930d8f8dbab9. Specifically, the same set of
patches that added the "retry" for the hardlockup detector had the
WARN_ON(). It feels highly unlikely the WARN_ON was firing at that
point in time. You can see the whole series of patches at:
https://lore.kernel.org/linux-arm-kernel/20220903093415.15850-1-lecopzer.chen@mediatek.com/
...yes, I ended up rebasing them and included them when I landed the
buddy lockup detector where they landed, but they should have been
equivalent to Lecopzer's patches.
> The UAF risk stems from the fact that preemption is enabled during the
> probe. If the worker thread (even if on a per-cpu wq) is preempted or
> if the logic assumes the task cannot migrate (which is_percpu_thread
> usually guarantees), we have a logical gap. By making the probe path
> stateless and using cpu_hotplug_disable(), we eliminate this dependency
> entirely.
>
> > If that's the case, we'd definitely want to at least change the
> > description and presumably _remove_ the Fixes tag? I actually still
> > think the code looks nicer after your CL and (maybe?) we could even
> > remove the whole schedule_work() for running this code? Maybe it was
> > only added to deal with this exact problem? ...but the CL description
> > would definitely need to be updated.
>
> The schedule_work() in lockup_detector_retry_init() (added by 930d8f8dbab9)
> is necessary for platforms where the PMU or other dependencies aren't ready
> during early init.
>
> I agree that the commit description should be updated to clarify that
> while the issue was caught in a downstream kernel with shifted init timings,
> it identifies a latent race condition in the mainline retry path.
>
> Regarding the 'Fixes' tag, since 930d8f8dbab9 introduced the asynchronous
> retry path which calls the probe logic from a non-percpu-thread context,
> it still seems like the appropriate target for the "root cause" of the
> vulnerability.
>
> I'll refactor the commit message in V5 to better reflect this context
> and remove the emphasis on ToT being "broken" out-of-the-box (since early
> init is indeed safe there).
>
> How does that sound to you?
I'm still not convinced that there was ever a UAF in mainline nor that
this actually "Fixes" anything in mainline. I do agree that the code
is better by not having it write the per-cpu variable at probe time,
but unless you can say that you've actually tested _on mainline_ and
demonstrated that the WARN_ON() is truly hitting _on mainline_ by
providing a printout of it happening _on mainline_ or somehow shown
the UAF actually happening _on mainline_ then we simply can't claim
that this is a Fix. Although I supposed I'd also be OK with doing any
of the above on any pure upstream kernel after commit 930d8f8dbab9, as
well.
-Doug
Powered by blists - more mailing lists