linux-kernel - Re: [PATCH] tracing: fix race when creating trace probe log error message

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0b13bbd0001c41ca1866ccf58f6e9de6ca2c24d9.camel@gmail.com>
Date: Thu, 01 May 2025 22:56:44 +0200
From: Paul Cacheux <paulcacheux@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>, Paul Cacheux via B4 Relay
	 <devnull+paulcacheux.gmail.com@...nel.org>
Cc: Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers
	 <mathieu.desnoyers@...icios.com>, linux-kernel@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH] tracing: fix race when creating trace probe log error
 message

On Thu, 2025-05-01 at 15:45 -0400, Steven Rostedt wrote:
> On Tue, 22 Apr 2025 20:33:13 +0200
> Paul Cacheux via B4 Relay <devnull+paulcacheux.gmail.com@...nel.org>
> wrote:
> 
> > From: Paul Cacheux <paulcacheux@...il.com>
> 
> Sorry for the late reply, I just noticed this patch.

No problem at all, thanks for looking at my patch.

> 
> > 
> > When creating a trace probe a global variable is modified and this
> > data used when an error is raised and the error message generated.
> > 
> > Modification of this global variable is done without any lock and
> > multiple trace operations will race, causing some potential issues
> > when generating the error.
> > 
> > This commit moves away from the global variable and passes the
> > error context as a regular function argument.
> > 
> > Fixes: ab105a4fb894 ("tracing: Use tracing error_log with probe
> > events")
> > 
> > Signed-off-by: Paul Cacheux <paulcacheux@...il.com>
> > ---
> > As reported in [1] a race exists in the shared trace probe log
> > used to build error messages. This can cause kernel crashes
> > when building the actual error message, but the race happens
> > even for non-error tracefs uses, it's just not visible.
> > 
> > Reproducer first reported that is still crashing:
> > 
> >   # 'p4' is invalid command which make kernel run into
> > trace_probe_log_err()
> >   cd /sys/kernel/debug/tracing
> >   while true; do
> >     echo 'p4:myprobe1 do_sys_openat2 dfd=%ax filename=%dx flags=%cx
> > mode=+4($stack)' >> kprobe_events &
> >     echo 'p4:myprobe2 do_sys_openat2' >> kprobe_events &
> >     echo 'p4:myprobe3 do_sys_openat2 dfd=%ax filename=%dx' >>
> > kprobe_events &
> >   done;
> > 
> > The original email suggested to use a mutex or to allocate the
> > trace_probe_log on the stack. The mutex can cause performance
> > issues, and require high confidence in the correctness of the
> > current trace_probe_log_clear calls. This patch implements
> > the stack solution instead and passes a pointer to using
> > functions.
> > 
> > [1]
> > https://lore.kernel.org/all/20221121081103.3070449-1-zhengyejian1@huawei.com/T/
> 
> Honestly, I don't like either approach.
> 
> What could be done is wrap the internals of the function in a mutex
> so they
> are not re-entrant (using guard(mutex)). If two error codes are
> happening
> together, just let it get corrupted. There should never be two
> additions at
> the same time, and if the admin is doing that then they deserve what
> they
> get.

Just to double check, what you are suggesting here is to include a
mutex in the shared trace_probe_log entry, and to lock it in all
accessors functions (trace_probe_log_{init,set_index,clear,err})?

> 
> I don't care if the error log gets garbage if there's multiple
> accesses at
> the same time. The fix should only prevent it from crashing.
> 
> -- Steve
> 
> 
> -- Steve

Thanks for the feedback,
Paul Cacheux