[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250929044836.7169d5be@batman.local.home>
Date: Mon, 29 Sep 2025 04:48:36 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: chenyuan_fl@....com
Cc: mhiramat@...nel.org, mathieu.desnoyers@...icios.com,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, Yuan Chen
<chenyuan@...inos.cn>, Peter Zijlstra <peterz@...radead.org>, Sebastian
Andrzej Siewior <bigeasy@...utronix.de>, John Ogness
<john.ogness@...utronix.de>
Subject: Re: [PATCH v2] tracing: Fix race condition in kprobe initialization
causing NULL pointer dereference
On Mon, 29 Sep 2025 07:57:31 +0100
chenyuan_fl@....com wrote:
> From: Yuan Chen <chenyuan@...inos.cn>
>
> There is a critical race condition in kprobe initialization that can lead to
> NULL pointer dereference and kernel crash.
>
> [1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000
> ...
> [1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO)
> [1135630.269239] pc : kprobe_perf_func+0x30/0x260
> [1135630.277643] lr : kprobe_dispatcher+0x44/0x60
> [1135630.286041] sp : ffffaeff4977fa40
> [1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400
> [1135630.302837] x27: 0000000000000000 x26: 0000000000000000
> [1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528
> [1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50
> [1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50
> [1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000
> [1135630.349985] x17: 0000000000000000 x16: 0000000000000000
> [1135630.359285] x15: 0000000000000000 x14: 0000000000000000
> [1135630.368445] x13: 0000000000000000 x12: 0000000000000000
> [1135630.377473] x11: 0000000000000000 x10: 0000000000000000
> [1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000
> [1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000
> [1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000
> [1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006
> [1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000
> [1135630.429410] Call trace:
> [1135630.434828] kprobe_perf_func+0x30/0x260
> [1135630.441661] kprobe_dispatcher+0x44/0x60
> [1135630.448396] aggr_pre_handler+0x70/0xc8
> [1135630.454959] kprobe_breakpoint_handler+0x140/0x1e0
> [1135630.462435] brk_handler+0xbc/0xd8
> [1135630.468437] do_debug_exception+0x84/0x138
> [1135630.475074] el1_dbg+0x18/0x8c
> [1135630.480582] security_file_permission+0x0/0xd0
> [1135630.487426] vfs_write+0x70/0x1c0
> [1135630.493059] ksys_write+0x5c/0xc8
> [1135630.498638] __arm64_sys_write+0x24/0x30
> [1135630.504821] el0_svc_common+0x78/0x130
> [1135630.510838] el0_svc_handler+0x38/0x78
> [1135630.516834] el0_svc+0x8/0x1b0
>
> kernel/trace/trace_kprobe.c: 1308
> 0xffff3df8995039ec <kprobe_perf_func+0x2c>: ldr x21, [x24,#120]
> include/linux/compiler.h: 294
> 0xffff3df8995039f0 <kprobe_perf_func+0x30>: ldr x1, [x21,x0]
>
> kernel/trace/trace_kprobe.c
> 1308: head = this_cpu_ptr(call->perf_events);
> 1309: if (hlist_empty(head))
> 1310: return 0;
>
> crash> struct trace_event_call -o
> struct trace_event_call {
> ...
> [120] struct hlist_head *perf_events; //(call->perf_event)
> ...
> }
>
> crash> struct trace_event_call ffffaf015340e528
> struct trace_event_call {
> ...
> perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0
> ...
> }
>
> Race Condition Analysis:
>
> The race occurs between kprobe activation and perf_events initialization:
>
> CPU0 CPU1
> ==== ====
> perf_kprobe_init
> perf_trace_event_init
> tp_event->perf_events = list;(1)
> tp_event->class->reg (2)← KPROBE ACTIVE
> Debug exception triggers
> ...
> kprobe_dispatcher
> kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE)
> head = this_cpu_ptr(call->perf_events)(3)
> (perf_events is still NULL)
>
> Problem:
> 1. CPU0 executes (1) assigning tp_event->perf_events = list
> 2. CPU0 executes (2) enabling kprobe functionality via class->reg()
> 3. CPU1 triggers and reaches kprobe_dispatcher
> 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed)
> 5. CPU1 calls kprobe_perf_func() and crashes at (3) because
> call->perf_events is still NULL
>
> The issue: Assignment in step 1 may not be visible to CPU1 due to
> missing memory barriers before step 2 sets TP_FLAG_PROFILE flag.
>
> Add smp_mb() barrier between perf_events assignment and enabling
> profile functionality to ensure visibility ordering across CPUs.
>
> v1->v2: Fix race analysis (per Masami) - kprobe arms in class->reg().
>
> Signed-off-by: Yuan Chen <chenyuan@...inos.cn>
> ---
> kernel/trace/trace_event_perf.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index a6bb7577e8c5..6eff8c9d6bae 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -113,6 +113,11 @@ static int perf_trace_event_reg(struct trace_event_call *tp_event,
>
> tp_event->perf_events = list;
>
> + /* Ensure perf_events assignment is visible to all CPUs before enabling
> + * profile functionality
> + */
> + smp_mb();
So from other discussions I had with John and Sebastian (both Cc'd),
memory barriers are not for "making memory visible", but instead are
for interactions between memory and two different CPUs, where both CPUs
have memory barriers.
John or Sebastian, thoughts?
-- Steve
> +
> if (!total_ref_count) {
> char __percpu *buf;
> int i;
Powered by blists - more mailing lists