[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250218145859.27762-1-tglozar@redhat.com>
Date: Tue, 18 Feb 2025 15:58:51 +0100
From: Tomas Glozar <tglozar@...hat.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-trace-kernel@...r.kernel.org,
linux-kernel@...r.kernel.org,
John Kacur <jkacur@...hat.com>,
Luis Goncalves <lgoncalv@...hat.com>,
Gabriele Monaco <gmonaco@...hat.com>,
Clark Williams <williams@...hat.com>,
Tomas Glozar <tglozar@...hat.com>
Subject: [PATCH 0/8] rtla: Collect data using BPF program
The current implementation of rtla uses libtracefs and libtraceevent to
pull sample events generated by the timerlat tracer from the trace
buffer. rtla then processes the sample by updating the histogram and
summary (current, maximum, minimum, and sum values) as well as checks
if tracing has been stopped due to threshold overflow.
In use cases where a large number of samples is being generated, that
is, with measurements running on many CPUs and with a low interval,
this sample processing design causes a significant CPU load on the rtla
side. Furthermode, with >100 CPUs and 100us interval, rtla was reported
as not being able to keep up with the samples and dropping most of them,
leading to it being unusable.
A timerlat trace change was proposed [1] to implement an alternative
way of processing timerlat samples by the way of a trace event. This
patchset makes use of that by attaching a BPF program to the trace event
using the BPF skeleton feature of bpftool. One BPF program is shared
for both top and hist, operating in three different modes: top, hist,
and auto-analysis only. Data is collected using per-CPU BPF maps to
achieve maximum performance and avoid lock contention. The maps are then
processed in userspace when the data is to be displayed (at the end of
the run for hist and quiet top, once per second for regular top).
During the time of measurement, the new implementation is idle, waiting
for either a signal or threshold overflow. Unlike the current
implementation, the BPF implementation does not check whether tracing is
stopped (in BPF mode, tracing is always off to improve performance), but
waits for a write to a BPF ringbuffer instead. This allows rtla to exit
immediately when a threshold is violated, without waiting for the next
iteration of the while loop.
If the requirements for the BPF implementation are not met, either at
build time or at run time, the current implementation is used as
fallback. Which implementation is being used can be seen when running
rtla timerlat with "-D" option. rtla can be forced to run in non-BPF
mode by setting the RTLA_NO_BPF option to 1, for debugging purposes.
The BPF implementation has the following build requirements:
- libbpf 1.0.0 or later
- bpftool with skeleton support
- clang with BPF CO-RE support
Unlike perf, rtla does not build its own static libbpf and likewise
relies on system bpftool instead of using an in-tree one. In the future,
this might change if modern BPF features not commonly available on
Linux distributions are introduced.
The runtime requirements are as follows:
- BPF support enabled in the kernel
- libbpf library
- osnoise:timerlat_sample trace event present
No performance penalty was seen during testing on the timerlat tracer
side, as the performance of the BPF program is comparable to writing
the sample entry to the tracefs buffer. As rtla is idle during
measurements, except for printing the summary for timerlat-top in
non-quiet mode, the overall CPU usage is reduced significantly, and
the -H option to pin rtla to housekeeping CPUs becomes unnecessary for
most use cases.
Note: The unification of the timerlat_*_params struct was done to
enable the BPF implementation to be fully shared between top and hist,
besides the processing of the data. The plan is to avoid duplicate code
and instead continually merge the implementations of top and hist. top
was developed first, and currently, is essentially hist without
the histogram, and with some old code. Thus, I expect it to be possible
to fully merge it into the hist implementation in the future.
[1] https://lore.kernel.org/linux-trace-kernel/20250203090418.1458923-1-tglozar@redhat.com
Tomas Glozar (8):
rtla/timerlat: Unify params struct
tools/build: Add bpftool-skeletons feature test
rtla: Add optional dependency on BPF tooling
rtla/timerlat: Add BPF skeleton to collect samples
rtla/timerlat_hist: Use BPF to collect samples
rtla/timerlat_top: Move divisor to update
rtla/timerlat_top: Use BPF to collect samples
rtla/timerlat: Test BPF mode
tools/build/Makefile.feature | 3 +-
tools/build/feature/Makefile | 3 +
tools/scripts/Makefile.include | 3 +
tools/tracing/rtla/.gitignore | 1 +
tools/tracing/rtla/Makefile | 20 +-
tools/tracing/rtla/Makefile.config | 42 +++
tools/tracing/rtla/src/Build | 1 +
tools/tracing/rtla/src/osnoise.h | 2 +
tools/tracing/rtla/src/timerlat.bpf.c | 149 ++++++++++
tools/tracing/rtla/src/timerlat.h | 54 ++++
tools/tracing/rtla/src/timerlat_aa.c | 2 -
tools/tracing/rtla/src/timerlat_bpf.c | 166 +++++++++++
tools/tracing/rtla/src/timerlat_bpf.h | 59 ++++
tools/tracing/rtla/src/timerlat_hist.c | 229 +++++++++++-----
tools/tracing/rtla/src/timerlat_top.c | 366 +++++++++++++++++--------
tools/tracing/rtla/tests/timerlat.t | 14 +
16 files changed, 923 insertions(+), 191 deletions(-)
create mode 100644 tools/tracing/rtla/src/timerlat.bpf.c
create mode 100644 tools/tracing/rtla/src/timerlat_bpf.c
create mode 100644 tools/tracing/rtla/src/timerlat_bpf.h
--
2.48.1
Powered by blists - more mailing lists