lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250218145859.27762-1-tglozar@redhat.com>
Date: Tue, 18 Feb 2025 15:58:51 +0100
From: Tomas Glozar <tglozar@...hat.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-trace-kernel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	John Kacur <jkacur@...hat.com>,
	Luis Goncalves <lgoncalv@...hat.com>,
	Gabriele Monaco <gmonaco@...hat.com>,
	Clark Williams <williams@...hat.com>,
	Tomas Glozar <tglozar@...hat.com>
Subject: [PATCH 0/8] rtla: Collect data using BPF program

The current implementation of rtla uses libtracefs and libtraceevent to
pull sample events generated by the timerlat tracer from the trace
buffer. rtla then processes the sample by updating the histogram and
summary (current, maximum, minimum, and sum values) as well as checks
if tracing has been stopped due to threshold overflow.

In use cases where a large number of samples is being generated, that
is, with measurements running on many CPUs and with a low interval,
this sample processing design causes a significant CPU load on the rtla
side. Furthermode, with >100 CPUs and 100us interval, rtla was reported
as not being able to keep up with the samples and dropping most of them,
leading to it being unusable.

A timerlat trace change was proposed [1] to implement an alternative
way of processing timerlat samples by the way of a trace event. This
patchset makes use of that by attaching a BPF program to the trace event
using the BPF skeleton feature of bpftool. One BPF program is shared
for both top and hist, operating in three different modes: top, hist,
and auto-analysis only. Data is collected using per-CPU BPF maps to
achieve maximum performance and avoid lock contention. The maps are then
processed in userspace when the data is to be displayed (at the end of
the run for hist and quiet top, once per second for regular top).

During the time of measurement, the new implementation is idle, waiting
for either a signal or threshold overflow. Unlike the current
implementation, the BPF implementation does not check whether tracing is
stopped (in BPF mode, tracing is always off to improve performance), but
waits for a write to a BPF ringbuffer instead. This allows rtla to exit
immediately when a threshold is violated, without waiting for the next
iteration of the while loop.

If the requirements for the BPF implementation are not met, either at
build time or at run time, the current implementation is used as
fallback. Which implementation is being used can be seen when running
rtla timerlat with "-D" option. rtla can be forced to run in non-BPF
mode by setting the RTLA_NO_BPF option to 1, for debugging purposes.

The BPF implementation has the following build requirements:
- libbpf 1.0.0 or later
- bpftool with skeleton support
- clang with BPF CO-RE support

Unlike perf, rtla does not build its own static libbpf and likewise
relies on system bpftool instead of using an in-tree one. In the future,
this might change if modern BPF features not commonly available on
Linux distributions are introduced.

The runtime requirements are as follows:
- BPF support enabled in the kernel
- libbpf library
- osnoise:timerlat_sample trace event present

No performance penalty was seen during testing on the timerlat tracer
side, as the performance of the BPF program is comparable to writing
the sample entry to the tracefs buffer. As rtla is idle during
measurements, except for printing the summary for timerlat-top in
non-quiet mode, the overall CPU usage is reduced significantly, and
the -H option to pin rtla to housekeeping CPUs becomes unnecessary for
most use cases.

Note: The unification of the timerlat_*_params struct was done to
enable the BPF implementation to be fully shared between top and hist,
besides the processing of the data. The plan is to avoid duplicate code
and instead continually merge the implementations of top and hist. top
was developed first, and currently, is essentially hist without
the histogram, and with some old code. Thus, I expect it to be possible
to fully merge it into the hist implementation in the future.

[1] https://lore.kernel.org/linux-trace-kernel/20250203090418.1458923-1-tglozar@redhat.com

Tomas Glozar (8):
  rtla/timerlat: Unify params struct
  tools/build: Add bpftool-skeletons feature test
  rtla: Add optional dependency on BPF tooling
  rtla/timerlat: Add BPF skeleton to collect samples
  rtla/timerlat_hist: Use BPF to collect samples
  rtla/timerlat_top: Move divisor to update
  rtla/timerlat_top: Use BPF to collect samples
  rtla/timerlat: Test BPF mode

 tools/build/Makefile.feature           |   3 +-
 tools/build/feature/Makefile           |   3 +
 tools/scripts/Makefile.include         |   3 +
 tools/tracing/rtla/.gitignore          |   1 +
 tools/tracing/rtla/Makefile            |  20 +-
 tools/tracing/rtla/Makefile.config     |  42 +++
 tools/tracing/rtla/src/Build           |   1 +
 tools/tracing/rtla/src/osnoise.h       |   2 +
 tools/tracing/rtla/src/timerlat.bpf.c  | 149 ++++++++++
 tools/tracing/rtla/src/timerlat.h      |  54 ++++
 tools/tracing/rtla/src/timerlat_aa.c   |   2 -
 tools/tracing/rtla/src/timerlat_bpf.c  | 166 +++++++++++
 tools/tracing/rtla/src/timerlat_bpf.h  |  59 ++++
 tools/tracing/rtla/src/timerlat_hist.c | 229 +++++++++++-----
 tools/tracing/rtla/src/timerlat_top.c  | 366 +++++++++++++++++--------
 tools/tracing/rtla/tests/timerlat.t    |  14 +
 16 files changed, 923 insertions(+), 191 deletions(-)
 create mode 100644 tools/tracing/rtla/src/timerlat.bpf.c
 create mode 100644 tools/tracing/rtla/src/timerlat_bpf.c
 create mode 100644 tools/tracing/rtla/src/timerlat_bpf.h

-- 
2.48.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ