[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251009152359.604267051@kernel.org>
Date: Thu, 09 Oct 2025 11:23:59 -0400
From: Steven Rostedt <rostedt@...nel.org>
To: linux-kernel@...r.kernel.org
Cc: Masami Hiramatsu <mhiramat@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: [for-linus][PATCH 0/5] tracing: Clean up and fixes or v6.18
tracing clean up and fixes for v6.18:
- Have osnoise tracer use memdup_user_nul()
The function osnoise_cpus_write() open codes a kmalloc() and then
a copy_from_user() and then adds a nul byte at the end which is the
same as simply using memdup_user_nul().
- Fix wakeup and irq tracers when failing to acquire calltime
When the wakeup and irq tracers use the function graph tracer for
tracing function times, it saves a timestamp into the fgraph shadow
stack. It is possible that this could fail to be stored. If that
happens, it exits the routine early. These functions also disable
nesting of the operations by incremeting the data "disable" counter.
But if the calltime exits out early, it never increments the counter
back to what it needs to be.
Since there's only a couple of lines of code that does work after
acquiring the calltime, instead of exiting out early, reverse the
if statement to be true if calltime is acquired, and place the code
that is to be done within that if block. The clean up will always
be done after that.
- Fix ring_buffer_map() return value on failure of __rb_map_vma()
If __rb_map_vma() fails in ring_buffer_map(), it does not return
an error. This means the caller will be working against a bad vma
mapping. Have ring_buffer_map() return an error when __rb_map_vma()
fails.
- Fix regression of writing to the trace_marker file
A bug fix was made to change __copy_from_user_inatomic() to
copy_from_user_nofault() in the trace_marker write function.
The trace_marker file is used by applications to write into
it (usually with a file descriptor opened at the start of the
program) to record into the tracing system. It's usually used
in critical sections so the write to trace_marker is highly
optimized.
The reason for copying in an atomic section is that the write
reserves space on the ring buffer and then writes directly into
it. After it writes, it commits the event. The time between
reserve and commit must have preemption disabled.
The trace marker write does not have any locking nor can it
allocate due to the nature of it being a critical path.
Unfortunately, converting __copy_from_user_inatomic() to
copy_from_user_nofault() caused a regression in Android.
Now all the writes from its applications trigger the fault that
is rejected by the _nofault() version that wasn't rejected by
the _inatomic() version. Instead of getting data, it now just
gets a trace buffer filled with:
tracing_mark_write: <faulted>
To fix this, on opening of the trace_marker file, allocate
per CPU buffers that can be used by the write call. Then
when entering the write call, do the following:
preempt_disable();
cpu = smp_processor_id();
do {
cnt = nr_context_switches_cpu(cpu);
migrate_disable();
preempt_enable();
ret = copy_from_user(buffer, ptr, size);
preempt_disable();
migrate_enable();
} while (!ret && cnt != nr_context_switches_cpu(cpu));
if (!ret)
ring_buffer_write(buffer);
preempt_enable();
This works similarly to seqcount. As it must enabled preemption
to do a copy_from_user() into a per CPU buffer, if it gets
preempted, the buffer could be corrupted by another task.
To handle this, read the number of context switches of the current
CPU, disable migration, enable preemption, copy the data from
user space, then immediately disable preemption again.
If the number of context switches is the same, the buffer
is still valid. Otherwise it must be assumed that the buffer may
have been corrupted and it needs to try again.
Now the trace_marker write can get the user data even if it has
to fault it in, and still not grab any locks of its own.
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace/fixes
Head SHA1: 64cf7d058a005c5c31eb8a0b741f35dc12915d18
Ankit Khushwaha (1):
ring buffer: Propagate __rb_map_vma return value to caller
Steven Rostedt (3):
tracing: Fix wakeup tracers on failure of acquiring calltime
tracing: Fix irqoff tracers on failure of acquiring calltime
tracing: Have trace_marker use per-cpu data to read user space
Thorsten Blum (1):
tracing/osnoise: Replace kmalloc + copy_from_user with memdup_user_nul
----
kernel/trace/ring_buffer.c | 2 +-
kernel/trace/trace.c | 268 +++++++++++++++++++++++++++++++-------
kernel/trace/trace_irqsoff.c | 23 ++--
kernel/trace/trace_osnoise.c | 11 +-
kernel/trace/trace_sched_wakeup.c | 16 +--
5 files changed, 241 insertions(+), 79 deletions(-)
Powered by blists - more mailing lists