[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250501154503.2308f177@gandalf.local.home>
Date: Thu, 1 May 2025 15:45:03 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Paul Cacheux via B4 Relay <devnull+paulcacheux.gmail.com@...nel.org>
Cc: paulcacheux@...il.com, Masami Hiramatsu <mhiramat@...nel.org>, Mathieu
Desnoyers <mathieu.desnoyers@...icios.com>, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH] tracing: fix race when creating trace probe log error
message
On Tue, 22 Apr 2025 20:33:13 +0200
Paul Cacheux via B4 Relay <devnull+paulcacheux.gmail.com@...nel.org> wrote:
> From: Paul Cacheux <paulcacheux@...il.com>
Sorry for the late reply, I just noticed this patch.
>
> When creating a trace probe a global variable is modified and this
> data used when an error is raised and the error message generated.
>
> Modification of this global variable is done without any lock and
> multiple trace operations will race, causing some potential issues
> when generating the error.
>
> This commit moves away from the global variable and passes the
> error context as a regular function argument.
>
> Fixes: ab105a4fb894 ("tracing: Use tracing error_log with probe events")
>
> Signed-off-by: Paul Cacheux <paulcacheux@...il.com>
> ---
> As reported in [1] a race exists in the shared trace probe log
> used to build error messages. This can cause kernel crashes
> when building the actual error message, but the race happens
> even for non-error tracefs uses, it's just not visible.
>
> Reproducer first reported that is still crashing:
>
> # 'p4' is invalid command which make kernel run into trace_probe_log_err()
> cd /sys/kernel/debug/tracing
> while true; do
> echo 'p4:myprobe1 do_sys_openat2 dfd=%ax filename=%dx flags=%cx mode=+4($stack)' >> kprobe_events &
> echo 'p4:myprobe2 do_sys_openat2' >> kprobe_events &
> echo 'p4:myprobe3 do_sys_openat2 dfd=%ax filename=%dx' >> kprobe_events &
> done;
>
> The original email suggested to use a mutex or to allocate the
> trace_probe_log on the stack. The mutex can cause performance
> issues, and require high confidence in the correctness of the
> current trace_probe_log_clear calls. This patch implements
> the stack solution instead and passes a pointer to using
> functions.
>
> [1] https://lore.kernel.org/all/20221121081103.3070449-1-zhengyejian1@huawei.com/T/
Honestly, I don't like either approach.
What could be done is wrap the internals of the function in a mutex so they
are not re-entrant (using guard(mutex)). If two error codes are happening
together, just let it get corrupted. There should never be two additions at
the same time, and if the admin is doing that then they deserve what they
get.
I don't care if the error log gets garbage if there's multiple accesses at
the same time. The fix should only prevent it from crashing.
-- Steve
-- Steve
Powered by blists - more mailing lists