[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b13bbd0001c41ca1866ccf58f6e9de6ca2c24d9.camel@gmail.com>
Date: Thu, 01 May 2025 22:56:44 +0200
From: Paul Cacheux <paulcacheux@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>, Paul Cacheux via B4 Relay
<devnull+paulcacheux.gmail.com@...nel.org>
Cc: Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com>, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH] tracing: fix race when creating trace probe log error
message
On Thu, 2025-05-01 at 15:45 -0400, Steven Rostedt wrote:
> On Tue, 22 Apr 2025 20:33:13 +0200
> Paul Cacheux via B4 Relay <devnull+paulcacheux.gmail.com@...nel.org>
> wrote:
>
> > From: Paul Cacheux <paulcacheux@...il.com>
>
> Sorry for the late reply, I just noticed this patch.
No problem at all, thanks for looking at my patch.
>
> >
> > When creating a trace probe a global variable is modified and this
> > data used when an error is raised and the error message generated.
> >
> > Modification of this global variable is done without any lock and
> > multiple trace operations will race, causing some potential issues
> > when generating the error.
> >
> > This commit moves away from the global variable and passes the
> > error context as a regular function argument.
> >
> > Fixes: ab105a4fb894 ("tracing: Use tracing error_log with probe
> > events")
> >
> > Signed-off-by: Paul Cacheux <paulcacheux@...il.com>
> > ---
> > As reported in [1] a race exists in the shared trace probe log
> > used to build error messages. This can cause kernel crashes
> > when building the actual error message, but the race happens
> > even for non-error tracefs uses, it's just not visible.
> >
> > Reproducer first reported that is still crashing:
> >
> > # 'p4' is invalid command which make kernel run into
> > trace_probe_log_err()
> > cd /sys/kernel/debug/tracing
> > while true; do
> > echo 'p4:myprobe1 do_sys_openat2 dfd=%ax filename=%dx flags=%cx
> > mode=+4($stack)' >> kprobe_events &
> > echo 'p4:myprobe2 do_sys_openat2' >> kprobe_events &
> > echo 'p4:myprobe3 do_sys_openat2 dfd=%ax filename=%dx' >>
> > kprobe_events &
> > done;
> >
> > The original email suggested to use a mutex or to allocate the
> > trace_probe_log on the stack. The mutex can cause performance
> > issues, and require high confidence in the correctness of the
> > current trace_probe_log_clear calls. This patch implements
> > the stack solution instead and passes a pointer to using
> > functions.
> >
> > [1]
> > https://lore.kernel.org/all/20221121081103.3070449-1-zhengyejian1@huawei.com/T/
>
> Honestly, I don't like either approach.
>
> What could be done is wrap the internals of the function in a mutex
> so they
> are not re-entrant (using guard(mutex)). If two error codes are
> happening
> together, just let it get corrupted. There should never be two
> additions at
> the same time, and if the admin is doing that then they deserve what
> they
> get.
Just to double check, what you are suggesting here is to include a
mutex in the shared trace_probe_log entry, and to lock it in all
accessors functions (trace_probe_log_{init,set_index,clear,err})?
>
> I don't care if the error log gets garbage if there's multiple
> accesses at
> the same time. The fix should only prevent it from crashing.
>
> -- Steve
>
>
> -- Steve
Thanks for the feedback,
Paul Cacheux
Powered by blists - more mailing lists