[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250501013202.997535180@goodmis.org>
Date: Wed, 30 Apr 2025 21:32:02 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org
Cc: Masami Hiramatsu <mhiramat@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Josh Poimboeuf <jpoimboe@...nel.org>,
x86@...nel.org,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Indu Bhagat <indu.bhagat@...cle.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
linux-perf-users@...r.kernel.org,
Mark Brown <broonie@...nel.org>,
linux-toolchains@...r.kernel.org,
Jordan Rome <jordalgo@...a.com>,
Sam James <sam@...too.org>,
Andrii Nakryiko <andrii.nakryiko@...il.com>,
Jens Remus <jremus@...ux.ibm.com>,
Florian Weimer <fweimer@...hat.com>,
Andy Lutomirski <luto@...nel.org>,
Weinan Liu <wnliu@...gle.com>,
Blake Jones <blakejones@...gle.com>,
Beau Belgrave <beaub@...ux.microsoft.com>,
"Jose E. Marchesi" <jemarch@....org>,
Alexander Aring <aahringo@...hat.com>
Subject: [PATCH v6 0/5] perf: Deferred unwinding of user space stack traces for per CPU events
This is v6 of:
https://lore.kernel.org/linux-trace-kernel/20250424192456.851953422@goodmis.org/
But this only adds the unwind deferred interface and not the ftrace code
so that perf can use it.
This series is based on top of:
https://lore.kernel.org/linux-trace-kernel/20250430195746.827125963@goodmis.org/
The above patch series adds deferred unwinding for task events, but not
for per CPU events. This is because the event is only tracing a single task
and can use a task_work to trigger its own callbacks. But per CPU events do
not have that luxury. A single per CPU event can request a deferred user
space stacktrace for several tasks before receiving any of the deferred
stacktraces.
To solve this, per CPU events will use the extended interface of the
deferred unwinder that ftrace will use. This includes the new API of:
unwind_deferred_init()
unwind_deferred_request()
unwind_deferred_cancel()
What perf now does is:
When a new per CPU event is created, it searches a global list of
descriptors that map to the group_leader of tasks that create these events.
The PID of group_leader of current is used to find this descriptor.
If one is found, the event is simply added to it. If one is not found, then
it will create the descriptor.
This descriptor has an array the size of possible CPUs and holds per CPU
descriptors. Each of these CPU descriptors has a linked list that would
holds the CPU events that were created and want deferred unwinding.
The group_leader descriptor has a unwind_work descriptor that it registers
with the unwind deferred infrastructure with unwind_deferred_init().
Each event within this descriptor has a pointer to this descriptor.
When a request is made from interrupt context to have a deferred unwind
happen, it calls unwind_deferred_request() passing it the group_leader
descriptor.
When the task returns back to user space, it will call the callback
associated with the group_leader descriptor, and that callback will pass the
user space stacktrace to the event attached to the CPU from its CPU array.
When these events are freed, they are removed from this descriptor, and
when the last event is removed, the descriptor is freed.
I've tested this, and this appears to work fine. All the associated events
that the perf tool creates are associated via this descriptor. At least it
doesn't overflow the max number of unwind works that can be attached to the
unwind deferred infrastructure.
This is based on v5 of the unwind code mentioned above. Changes since
then include:
- Have unwind_deferred_request() return positive if already queued
- Check (current->flags & PF_KTHREAD | PF_EXITING) in
unwind_deferred_request(), as the task_work will fail to be added in the
exit code.
- Have unwind_deferred_request() return positive if already queued.
- Use SRCU to protect the list of callbacks when a task returns instead of
using a global mutex. (Mathieu Desnoyers)
- Does not include ftrace update
- Includes perf per CPU events using this infrastructure
Josh Poimboeuf (2):
unwind_user/deferred: Add deferred unwinding interface
unwind_user/deferred: Make unwind deferral requests NMI-safe
Steven Rostedt (3):
unwind deferred: Use bitmask to determine which callbacks to call
unwind deferred: Use SRCU unwind_deferred_task_work()
perf: Support deferred user callchains for per CPU events
----
include/linux/perf_event.h | 5 +
include/linux/sched.h | 1 +
include/linux/unwind_deferred.h | 19 +++
include/linux/unwind_deferred_types.h | 4 +
kernel/events/core.c | 226 +++++++++++++++++++++++---
kernel/unwind/deferred.c | 290 +++++++++++++++++++++++++++++++++-
6 files changed, 519 insertions(+), 26 deletions(-)
Powered by blists - more mailing lists