[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20251202181335.48c00a8c@gandalf.local.home>
Date: Tue, 2 Dec 2025 18:13:35 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Masami Hiramatsu
<mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Mark Rutland <mark.rutland@....com>, Andrew Morton
<akpm@...ux-foundation.org>, Menglong Dong <menglong8.dong@...il.com>,
Yongliang Gao <leonylgao@...cent.com>, pengdonglin <pengdonglin@...omi.com>
Subject: [GIT PULL] tracing: Updates for v6.19
Linus,
tracing updates for v6.19:
- Merge branch shared with kprobes on extending trace options
The trace options were defined by a 32 bit variable. This limits the
tracing instances to have a total of 32 different options. As that limit
has been hit, and more options are being added, increase the option mask
to a 64 bit number, doubling the number of options available.
As this is required for the kprobe topic branches as well as the tracing
topic branch, a separate branch was created and merged into both.
- Make trace_user_fault_read() available for the rest of tracing
The function trace_user_fault_read() is used by trace_marker file read to
allow reading user space to be done fast and without locking or
allocations. Make this available so that the system call trace events can
use it too.
- Have system call trace events read user space values
Now that the system call trace events callbacks are called in a faultable
context, take advantage of this and read the user space buffers for
various system calls. For example, show the path name of the openat system
call instead of just showing the pointer to that path name in user space.
Also show the contents of the buffer of the write system call. Several
system call trace events are updated to make tracing into a light weight
strace tool for all applications in the system.
- Update perf system call tracing to do the same
- And a config and syscall_user_buf_size file to control the size of the buffer
Limit the amount of data that can be read from user space. The default
size is 63 bytes but that can be expanded to 165 bytes.
- Allow the persistent ring buffer to print system calls normally
The persistent ring buffer prints trace events by their type and ignores
the print_fmt. This is because the print_fmt may change from kernel to
kernel. As the system call output is fixed by the system call ABI itself,
there's no reason to limit that. This makes reading the system call events
in the persistent ring buffer much nicer and easier to understand.
- Add options to show text offset to function profiler
The function profiler that counts the number of times a function is hit
currently lists all functions by its name and offset. But this becomes
ambiguous when there are several functions with the same name. Add a
tracing option that changes the output to be that of _text+offset
instead. Now a user space tool can use this information to map the
_text+offset to the unique function it is counting.
- Report bad dynamic event command
If a bad command is passed to the dynamic_events file, report it properly
in the error log.
- Clean up tracer options
Clean up the tracer option code a bit, by removing some useless code and
also using switch statements instead of a series of if statements.
- Have tracing options be instance specific
Tracers can have their own options (function tracer, irqsoff tracer,
function graph tracer, etc). But now that the same tracer can be enabled
in multiple trace instances, their options are still global. The API is
per instance, thus changing one affects other instances. This isn't even
consistent, as the option take affect differently depending on when an
tracer started in an instance. Make the options for instances only affect
the instance it is changed under.
- Optimize pid_list lock contention
Whenever the pid_list is read, it uses a spin lock. This happens at every
sched switch. Taking the lock at sched switch can be removed by instead
using a seqlock counter.
- Clean up the trace trigger structures
The trigger code uses two different structures to implement a single
tigger. This was due to trying to reuse code for the two different types
of triggers (always on trigger, and count limited trigger). But by adding
a single field to one structure, the other structure could be absorbed
into the first structure making he code easier to understand.
- Create a bulk garbage collector for trace triggers
If user space has triggers for several hundreds of events and then removes
them, it can take several seconds to complete. This is because each
removal calls the slow tracepoint_synchronize_unregister() that can take
hundreds of milliseconds to complete. Instead, create a helper thread that
will do the clean up. When a trigger is removed, it will create the
kthread if it isn't already created, and then add the trigger to a llist.
The kthread will take the items off the llist, call
tracepoint_synchronize_unregister(), and then remove the items it took
off. It will then check if there's more items to free before sleeping.
This makes user space removing all these triggers to finish in less than a
second.
- Allow function tracing of some of the tracing infrastructure code
Because the tracing code can cause recursion issues if it is traced by the
function tracer the entire tracing directory disables function tracing.
But not all of tracing causes issues if it is traced. Namely, the event
tracing code. Add a config that enables some of the tracing code to be
traced to help in debugging it. Note, when this is enabled, it does add
noise to general function tracing, especially if events are enabled as
well (which is a common case).
- Add boot-time backup instance for persistent buffer
The persistent ring buffer is used mostly for kernel crash analysis in the
field. One issue is that if there's a crash, the data in the persistent
ring buffer must be read before tracing can begin using it. This slows
down the boot process. Once tracing starts in the persistent ring buffer,
the old data must be freed and the addresses no longer match and old
events can't be in the buffer with new events.
Create a way to create a backup buffer that copies the persistent ring
buffer at boot up. Then after a crash, the always on tracer can begin
immediately as well as the normal boot process while the crash analysis
tooling uses the backup buffer. After the backup buffer is finished being
read, it can be removed.
- Enable function graph args and return address options at the same time
Currently the when reading of arguments in the function graph tracer is
enabled, the option to record the parent function in the entry event can
not be enabled. Update the code so that it can.
- Add new struct_offset() helper macro
Add a new macro that takes a pointer to a structure and a name of one of
its members and it will return the offset of that member. This allows the
ring buffer code to simplify the following:
From: size = struct_size(entry, buf, cnt - sizeof(entry->id));
To: size = struct_offset(entry, id) + cnt;
There should be other simplifications that this macro can help out with as
well.
Please pull the latest trace-v6.19 tree, which can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
trace-v6.19
Tag SHA1: ef15b617da8e889ab6a114081f6157096866703b
Head SHA1: f6ed9c5d3190cf18382ee75e0420602101f53586
Masami Hiramatsu (Google) (5):
tracing: Allow tracer to add more than 32 options
tracing: Add an option to show symbols in _text+offset for function profiler
tracing: Report wrong dynamic event command
tracing: Show the tracer options in boot-time created instance
tracing: Add boot-time backup of persistent ring buffer
Menglong Dong (1):
ftrace: Avoid redundant initialization in register_ftrace_direct
Steven Rostedt (35):
tracing: Make trace_user_fault_read() exposed to rest of tracing
tracing: Have syscall trace events read user space string
perf: tracing: Simplify perf_sysenter_enable/disable() with guards
perf: tracing: Have perf system calls read user space
tracing: Have system call events record user array data
tracing: Display some syscall arrays as strings
tracing: Allow syscall trace events to read more than one user parameter
tracing: Add a config and syscall_user_buf_size file to limit amount written
tracing: Show printable characters in syscall arrays
tracing: Add trace_seq_pop() and seq_buf_pop()
tracing: Add parsing of flags to the sys_enter_openat trace event
tracing: Check for printable characters when printing field dyn strings
tracing: Have persistent ring buffer print syscalls normally
Merge branch 'topic/func-profiler-offset' of git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux into trace/trace/core
tracing: Hide __NR_utimensat and _NR_mq_timedsend when not defined
tracing: Remove dummy options and flags
tracing: Have add_tracer_options() error pass up to callers
tracing: Exit out immediately after update_marker_trace()
tracing: Use switch statement instead of ifs in set_tracer_flag()
tracing: Have tracer option be instance specific
tracing: Have function tracer define options per instance
tracing: Have function graph tracer define options per instance
tracing: Have function graph tracer option funcgraph-irqs be per instance
tracing: Move graph-time out of function graph options
tracing: Have function graph tracer option sleep-time be per instance
tracing: Convert function graph set_flags() to use a switch() statement
fgraph: Make fgraph_no_sleep_time signed
tracing: Remove unused variable in tracing_trace_options_show()
tracing: Remove get_trigger_ops() and add count_func() from trigger ops
tracing: Merge struct event_trigger_ops into struct event_command
tracing: Remove unneeded event_mutex lock in event_trigger_regex_release()
tracing: Add bulk garbage collection of freeing event_trigger_data
tracing: Use strim() in trigger_process_regex() instead of skip_spaces()
ftrace: Allow tracing of some of the tracing code
overflow: Introduce struct_offset() to get offset of member
Yongliang Gao (1):
trace/pid_list: optimize pid_list->lock contention
pengdonglin (1):
function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously
----
Documentation/trace/ftrace.rst | 8 +
include/linux/ftrace.h | 7 +-
include/linux/overflow.h | 12 +
include/linux/seq_buf.h | 17 +
include/linux/trace_seq.h | 13 +
include/trace/syscall.h | 8 +-
kernel/trace/Kconfig | 28 ++
kernel/trace/Makefile | 17 +
kernel/trace/blktrace.c | 6 +-
kernel/trace/fgraph.c | 10 +-
kernel/trace/ftrace.c | 32 +-
kernel/trace/pid_list.c | 30 +-
kernel/trace/pid_list.h | 1 +
kernel/trace/trace.c | 893 +++++++++++++++++++++++----------
kernel/trace/trace.h | 230 +++++----
kernel/trace/trace_dynevent.c | 11 +-
kernel/trace/trace_entries.h | 15 +-
kernel/trace/trace_eprobe.c | 19 +-
kernel/trace/trace_events.c | 4 +-
kernel/trace/trace_events_hist.c | 143 ++----
kernel/trace/trace_events_synth.c | 2 +-
kernel/trace/trace_events_trigger.c | 408 +++++++--------
kernel/trace/trace_fprobe.c | 6 +-
kernel/trace/trace_functions.c | 10 +-
kernel/trace/trace_functions_graph.c | 220 ++++++---
kernel/trace/trace_irqsoff.c | 30 +-
kernel/trace/trace_kdb.c | 2 +-
kernel/trace/trace_kprobe.c | 6 +-
kernel/trace/trace_output.c | 45 +-
kernel/trace/trace_output.h | 11 +
kernel/trace/trace_sched_wakeup.c | 24 +-
kernel/trace/trace_syscalls.c | 935 +++++++++++++++++++++++++++++++++--
32 files changed, 2301 insertions(+), 902 deletions(-)
---------------------------
Powered by blists - more mailing lists