[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260124070814.806828-1-realwujing@gmail.com>
Date: Sat, 24 Jan 2026 02:08:14 -0500
From: Qiliang Yuan <realwujing@...il.com>
To: dianders@...omium.org
Cc: akpm@...ux-foundation.org,
lihuafei1@...wei.com,
linux-kernel@...r.kernel.org,
mingo@...nel.org,
mm-commits@...r.kernel.org,
realwujing@...il.com,
song@...nel.org,
stable@...r.kernel.org,
sunshx@...natelecom.cn,
thorsten.blum@...ux.dev,
wangjinchao600@...il.com,
yangyicong@...ilicon.com,
yuanql9@...natelecom.cn,
zhangjn11@...natelecom.cn,
linux-watchdog@...r.kernel.org
Subject: [PATCH v4] watchdog/hardlockup: Fix UAF in perf event cleanup due to migration race
Original analysis on Linux 4.19 showed a race condition in the hardlockup
detector's initialization phase. Specifically, during the early probe
phase, hardlockup_detector_perf_init() (renamed to
watchdog_hardlockup_probe() in newer kernels via commit d9b3629ade8e)
interacted with the per-cpu 'watchdog_ev' variable.
If the initializing task migrates to another CPU during this probe phase,
two issues arise:
1. The 'watchdog_ev' pointer on the original CPU is set but not cleared,
leaving a stale pointer to a freed perf event.
2. The 'watchdog_ev' pointer on the new CPU might be incorrectly cleared.
Note: Although the logs below reference hardlockup_detector_perf_init(),
the same logic persists in the current watchdog_hardlockup_probe()
implementation.
This race condition was observed in console logs:
[23.038376] hardlockup_detector_perf_init 313 cur_cpu=2
...
[23.076385] hardlockup_detector_event_create 203 cpu(cur)=2 set watchdog_ev
...
[23.095788] perf_event_release_kernel 4623 cur_cpu=2
...
[23.116963] lockup_detector_reconfigure 577 cur_cpu=3
The log shows the task started on CPU 2, set watchdog_ev on CPU 2,
released the event on CPU 2, but then migrated to CPU 3 before the
cleanup logic could run. This left watchdog_ev on CPU 2 pointing to a
freed event, resulting in a UAF when later accessed:
[26.540732] BUG: KASAN: use-after-free in perf_event_ctx_lock_nested.isra.72+0x6b/0x140
[26.542442] Read of size 8 at addr ff110006b360d718 by task kworker/2:1/94
Fix this by refactoring hardlockup_detector_event_create() to return the
created perf event instead of directly assigning it to the per-cpu variable.
In the probe function, use an arbitrary CPU but ensure it remains
online via cpu_hotplug_disable() during the check.
Fixes: 930d8f8dbab9 ("watchdog/perf: adapt the watchdog_perf interface for async model")
Signed-off-by: Shouxin Sun <sunshx@...natelecom.cn>
Signed-off-by: Junnan Zhang <zhangjn11@...natelecom.cn>
Signed-off-by: Qiliang Yuan <realwujing@...il.com>
Signed-off-by: Qiliang Yuan <yuanql9@...natelecom.cn>
Cc: Song Liu <song@...nel.org>
Cc: Douglas Anderson <dianders@...omium.org>
Cc: Jinchao Wang <wangjinchao600@...il.com>
Cc: Wang Jinchao <wangjinchao600@...il.com>
Cc: <stable@...r.kernel.org>
---
v4:
- Add cpu_hotplug_disable() in watchdog_hardlockup_probe() to ensure the
sampled CPU remains online during probing.
- Update commit message to explain the relevance of 4.19 logs even
though functions were renamed in modern kernels.
v3:
- Refactor hardlockup_detector_event_create() to return the event pointer
instead of directly assigning to per-cpu variables to fix the UAF.
- Restore PMU cycle fallback and unify the enable/probe paths.
v2:
- Add Cc: <stable@...r.kernel.org>.
v1:
- Avoid 'watchdog_ev' in probe path by manually creating and releasing a
local perf event.
kernel/watchdog_perf.c | 56 +++++++++++++++++++++++++-----------------
1 file changed, 34 insertions(+), 22 deletions(-)
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index d3ca70e3c256..887b61c65c1b 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -17,6 +17,7 @@
#include <linux/atomic.h>
#include <linux/module.h>
#include <linux/sched/debug.h>
+#include <linux/cpu.h>
#include <asm/irq_regs.h>
#include <linux/perf_event.h>
@@ -118,18 +119,11 @@ static void watchdog_overflow_callback(struct perf_event *event,
watchdog_hardlockup_check(smp_processor_id(), regs);
}
-static int hardlockup_detector_event_create(void)
+static struct perf_event *hardlockup_detector_event_create(unsigned int cpu)
{
- unsigned int cpu;
struct perf_event_attr *wd_attr;
struct perf_event *evt;
- /*
- * Preemption is not disabled because memory will be allocated.
- * Ensure CPU-locality by calling this in per-CPU kthread.
- */
- WARN_ON(!is_percpu_thread());
- cpu = raw_smp_processor_id();
wd_attr = &wd_hw_attr;
wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
@@ -143,14 +137,7 @@ static int hardlockup_detector_event_create(void)
watchdog_overflow_callback, NULL);
}
- if (IS_ERR(evt)) {
- pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
- PTR_ERR(evt));
- return PTR_ERR(evt);
- }
- WARN_ONCE(this_cpu_read(watchdog_ev), "unexpected watchdog_ev leak");
- this_cpu_write(watchdog_ev, evt);
- return 0;
+ return evt;
}
/**
@@ -159,17 +146,26 @@ static int hardlockup_detector_event_create(void)
*/
void watchdog_hardlockup_enable(unsigned int cpu)
{
+ struct perf_event *evt;
+
WARN_ON_ONCE(cpu != smp_processor_id());
- if (hardlockup_detector_event_create())
+ evt = hardlockup_detector_event_create(cpu);
+ if (IS_ERR(evt)) {
+ pr_debug("Perf event create on CPU %d failed with %ld\n", cpu,
+ PTR_ERR(evt));
return;
+ }
/* use original value for check */
if (!atomic_fetch_inc(&watchdog_cpus))
pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
+ WARN_ONCE(this_cpu_read(watchdog_ev), "unexpected watchdog_ev leak");
+ this_cpu_write(watchdog_ev, evt);
+
watchdog_init_timestamp();
- perf_event_enable(this_cpu_read(watchdog_ev));
+ perf_event_enable(evt);
}
/**
@@ -263,19 +259,35 @@ bool __weak __init arch_perf_nmi_is_available(void)
*/
int __init watchdog_hardlockup_probe(void)
{
+ struct perf_event *evt;
+ unsigned int cpu;
int ret;
if (!arch_perf_nmi_is_available())
return -ENODEV;
- ret = hardlockup_detector_event_create();
+ if (!hw_nmi_get_sample_period(watchdog_thresh))
+ return -EINVAL;
- if (ret) {
+ /*
+ * Test hardware PMU availability by creating a temporary perf event.
+ * The requested CPU is arbitrary; preemption is not disabled, so
+ * raw_smp_processor_id() is used. Surround with cpu_hotplug_disable()
+ * to ensure the arbitrarily chosen CPU remains online during the check.
+ * The event is released immediately.
+ */
+ cpu_hotplug_disable();
+ cpu = raw_smp_processor_id();
+ evt = hardlockup_detector_event_create(cpu);
+ if (IS_ERR(evt)) {
pr_info("Perf NMI watchdog permanently disabled\n");
+ ret = PTR_ERR(evt);
} else {
- perf_event_release_kernel(this_cpu_read(watchdog_ev));
- this_cpu_write(watchdog_ev, NULL);
+ perf_event_release_kernel(evt);
+ ret = 0;
}
+ cpu_hotplug_enable();
+
return ret;
}
--
2.51.0
Powered by blists - more mailing lists