[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240412001732.475-1-beaub@linux.microsoft.com>
Date: Fri, 12 Apr 2024 00:17:28 +0000
From: Beau Belgrave <beaub@...ux.microsoft.com>
To: peterz@...radead.org,
mingo@...hat.com,
acme@...nel.org,
namhyung@...nel.org,
rostedt@...dmis.org,
mhiramat@...nel.org,
mathieu.desnoyers@...icios.com
Cc: linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org,
mark.rutland@....com,
alexander.shishkin@...ux.intel.com,
jolsa@...nel.org,
irogers@...gle.com,
adrian.hunter@...el.com,
primiano@...gle.com,
aahringo@...hat.com,
dcook@...ux.microsoft.com
Subject: [RFC PATCH 0/4] perf: Correlating user process data to samples
In the Open Telemetry profiling SIG [1], we are trying to find a way to
grab a tracing association quickly on a per-sample basis. The team at
Elastic has a bespoke way to do this [2], however, I'd like to see a
more general way to achieve this. The folks I've been talking with seem
open to the idea of just having a TLS value for this we could capture
upon each sample. We could then just state, Open Telemetry SDKs should
have a TLS value for span correlation. However, we need a way to sample
the TLS or other value(s) when a sampling event is generated. This is
supported today on Windows via EventActivityIdControl() [3]. Since
Open Telemetry works on both Windows and Linux, ideally we can do
something as efficient for Linux based workloads.
This series is to explore how it would be best possible to collect
supporting data from a user process when a profile sample is collected.
Having a value stored in TLS makes a lot of sense for this however
there are other ways to explore. Whatever is chosen, kernel samples
taken in process context should be able to get this supporting data.
In these patches on X64 the fsbase and gsbase are used for this.
An option to explore suggested by Mathieu Desnoyers is to utilize rseq
for processes to register a value location that can be included when
profiling if desired. This would allow a tighter contract between user
processes and a profiler. It would allow better labeling/categorizing
the correlation values.
An idea flow would look like this:
User Task Profile
do_work(); sample() -> IP + No activity
..
set_activity(123);
..
do_work(); sample() -> IP + activity (123)
..
set_activity(124);
..
do_work(); sample() -> IP + activity (124)
Ideally, the set_activity() method would not be a syscall. It needs to
be very cheap as this should not bottleneck work. Ideally this is just
a memcpy of 16-20 bytes as it is on Windows via EventActivityIdControl()
using EVENT_ACTIVITY_CTRL_SET_ID.
For those not aware, Open Telemetry allows collecting data from multiple
machines and show where time was spent. The tracing context is already
available for logs, but not for profiling samples. The idea is to show
where slowdowns occur and have profile samples to explain why they
slowed down. This must be possible without having to track context
switches to do this correlation. This is because the profiling rates
are typically 20hz - 1Khz, while the context switching rates are much
higher. We do not want to have to consume high context switch rates
just to know a correlation for a 20hz signal. Often these 20hz signals
are always enabled in some environments.
Regardless if TLS, rseq, or other source is used I believe we will need
a way for perf_events to include it within a sample. The changes in this
series show how it could be done with TLS. There is some factoring work
under perf to make it easier to add more dump types using the existing
ABI. This is mostly to make the patches clearer, certainly the refactor
parts could get dropped and we could have duplicated/specialized paths.
1. https://opentelemetry.io/blog/2024/profiling/
2. https://www.elastic.co/blog/continuous-profiling-distributed-tracing-correlation
3. https://learn.microsoft.com/en-us/windows/win32/api/evntprov/nf-evntprov-eventactivityidcontrol
Beau Belgrave (4):
perf/core: Introduce perf_prepare_dump_data()
perf: Introduce PERF_SAMPLE_TLS_USER sample type
perf/core: Factor perf_output_sample_udump()
perf/x86/core: Add tls dump support
arch/Kconfig | 7 ++
arch/x86/Kconfig | 1 +
arch/x86/events/core.c | 14 +++
arch/x86/include/asm/perf_event.h | 5 +
include/linux/perf_event.h | 7 ++
include/uapi/linux/perf_event.h | 5 +-
kernel/events/core.c | 166 +++++++++++++++++++++++-------
kernel/events/internal.h | 16 +++
8 files changed, 180 insertions(+), 41 deletions(-)
base-commit: fec50db7033ea478773b159e0e2efb135270e3b7
--
2.34.1
Powered by blists - more mailing lists