[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20260127022238.1182079-1-realwujing@gmail.com>
Date: Mon, 26 Jan 2026 21:22:24 -0500
From: Qiliang Yuan <realwujing@...il.com>
To: Ingo Molnar <mingo@...nel.org>,
Qiliang Yuan <realwujing@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Li Huafei <lihuafei1@...wei.com>,
Thorsten Blum <thorsten.blum@...ux.dev>,
Jinchao Wang <wangjinchao600@...il.com>,
Yicong Yang <yangyicong@...ilicon.com>,
Petr Mladek <pmladek@...e.com>,
Pingfan Liu <kernelfans@...il.com>,
Lecopzer Chen <lecopzer.chen@...iatek.com>,
Douglas Anderson <dianders@...omium.org>
Cc: linux-watchdog@...r.kernel.org,
mm-commits@...r.kernel.org,
Shouxin Sun <sunshx@...natelecom.cn>,
Junnan Zhang <zhangjn11@...natelecom.cn>,
Qiliang Yuan <yuanql9@...natelecom.cn>,
Song Liu <song@...nel.org>,
stable@...r.kernel.org,
"Yury Norov (NVIDIA)" <yury.norov@...il.com>,
linux-kernel@...r.kernel.org
Subject: [PATCH v5] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
The hardlockup detector's probe path (watchdog_hardlockup_probe()) can
be executed in a non-pinned context, such as during the asynchronous
retry mechanism (lockup_detector_delay_init) which runs in a standard
unbound workqueue.
In this context, the existing implementation of
hardlockup_detector_event_create() suffers from a race condition due to
potential task migration. It relies on is_percpu_thread() to ensure
CPU-locality, but worker threads in a global workqueue do not carry the
PF_PERCPU_THREAD flag, causing the WARN_ON() to trigger and violating
the assumption of stable per-cpu access.
If the task migrates during the probe:
1. It might set 'watchdog_ev' on one CPU but fail to clear it if the
subsequent migration causes the cleanup logic to run on a different CPU.
2. This leaves a stale pointer to a freed perf_event in the original
CPU's 'watchdog_ev' variable, leading to a use-after-free (UAF) when
the watchdog is later enabled or reconfigured.
While this issue was prominently observed in downstream kernels (like
openEuler 4.19) where initialization timings are shifted to a post-SMP
phase, it represents a latent bug in the mainline asynchronous
initialization path.
Refactor hardlockup_detector_event_create() to be stateless by returning
the created perf_event pointer instead of directly modifying the per-cpu
'watchdog_ev' variable. This allows the probe logic to safely manage
the temporary event. Use cpu_hotplug_disable() during the probe to ensure
the target CPU remains valid throughout the check.
Fixes: 930d8f8dbab9 ("watchdog/perf: adapt the watchdog_perf interface for async model")
Signed-off-by: Shouxin Sun <sunshx@...natelecom.cn>
Signed-off-by: Junnan Zhang <zhangjn11@...natelecom.cn>
Signed-off-by: Qiliang Yuan <realwujing@...il.com>
Signed-off-by: Qiliang Yuan <yuanql9@...natelecom.cn>
Cc: Song Liu <song@...nel.org>
Cc: Douglas Anderson <dianders@...omium.org>
Cc: Jinchao Wang <wangjinchao600@...il.com>
Cc: <stable@...r.kernel.org>
---
v5:
- Refine description: clarify it identifies a latent bug in the mainline
asynchronous retry path where worker threads lack PF_PERCPU_THREAD.
v4:
- Add cpu_hotplug_disable() in watchdog_hardlockup_probe() to stabilize
the probe CPU.
- Update description to explain the relevance of 4.19 logs.
v3:
- Refactor hardlockup_detector_event_create() to be stateless.
v2:
- Add Cc stable.
kernel/watchdog_perf.c | 56 +++++++++++++++++++++++++-----------------
1 file changed, 34 insertions(+), 22 deletions(-)
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index d3ca70e3c256..887b61c65c1b 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -17,6 +17,7 @@
#include <linux/atomic.h>
#include <linux/module.h>
#include <linux/sched/debug.h>
+#include <linux/cpu.h>
#include <asm/irq_regs.h>
#include <linux/perf_event.h>
@@ -118,18 +119,11 @@ static void watchdog_overflow_callback(struct perf_event *event,
watchdog_hardlockup_check(smp_processor_id(), regs);
}
-static int hardlockup_detector_event_create(void)
+static struct perf_event *hardlockup_detector_event_create(unsigned int cpu)
{
- unsigned int cpu;
struct perf_event_attr *wd_attr;
struct perf_event *evt;
- /*
- * Preemption is not disabled because memory will be allocated.
- * Ensure CPU-locality by calling this in per-CPU kthread.
- */
- WARN_ON(!is_percpu_thread());
- cpu = raw_smp_processor_id();
wd_attr = &wd_hw_attr;
wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
@@ -143,14 +137,7 @@ static int hardlockup_detector_event_create(void)
watchdog_overflow_callback, NULL);
}
- if (IS_ERR(evt)) {
- pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
- PTR_ERR(evt));
- return PTR_ERR(evt);
- }
- WARN_ONCE(this_cpu_read(watchdog_ev), "unexpected watchdog_ev leak");
- this_cpu_write(watchdog_ev, evt);
- return 0;
+ return evt;
}
/**
@@ -159,17 +146,26 @@ static int hardlockup_detector_event_create(void)
*/
void watchdog_hardlockup_enable(unsigned int cpu)
{
+ struct perf_event *evt;
+
WARN_ON_ONCE(cpu != smp_processor_id());
- if (hardlockup_detector_event_create())
+ evt = hardlockup_detector_event_create(cpu);
+ if (IS_ERR(evt)) {
+ pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
+ PTR_ERR(evt));
return;
+ }
/* use original value for check */
if (!atomic_fetch_inc(&watchdog_cpus))
pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
+ WARN_ONCE(this_cpu_read(watchdog_ev), "unexpected watchdog_ev leak");
+ this_cpu_write(watchdog_ev, evt);
+
watchdog_init_timestamp();
- perf_event_enable(this_cpu_read(watchdog_ev));
+ perf_event_enable(evt);
}
/**
@@ -263,19 +259,35 @@ bool __weak __init arch_perf_nmi_is_available(void)
*/
int __init watchdog_hardlockup_probe(void)
{
+ struct perf_event *evt;
+ unsigned int cpu;
int ret;
if (!arch_perf_nmi_is_available())
return -ENODEV;
- ret = hardlockup_detector_event_create();
+ if (!hw_nmi_get_sample_period(watchdog_thresh))
+ return -EINVAL;
- if (ret) {
+ /*
+ * Test hardware PMU availability by creating a temporary perf event.
+ * The requested CPU is arbitrary; preemption is not disabled, so
+ * raw_smp_processor_id() is used. Surround with cpu_hotplug_disable()
+ * to ensure the arbitrarily chosen CPU remains online during the check.
+ * The event is released immediately.
+ */
+ cpu_hotplug_disable();
+ cpu = raw_smp_processor_id();
+ evt = hardlockup_detector_event_create(cpu);
+ if (IS_ERR(evt)) {
pr_info("Perf NMI watchdog permanently disabled\n");
+ ret = PTR_ERR(evt);
} else {
- perf_event_release_kernel(this_cpu_read(watchdog_ev));
- this_cpu_write(watchdog_ev, NULL);
+ perf_event_release_kernel(evt);
+ ret = 0;
}
+ cpu_hotplug_enable();
+
return ret;
}
--
2.51.0
Powered by blists - more mailing lists