linux-kernel - [RFC PATCH 0/2] Tracing bursts of latencies

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210119164344.37500-1-Viktor.Rosendahl@bmw.de>
Date:   Tue, 19 Jan 2021 17:43:42 +0100
From:   Viktor Rosendahl <Viktor.Rosendahl@....de>
To:     Steven Rostedt <rostedt@...dmis.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        <linux-kernel@...r.kernel.org>
CC:     Ingo Molnar <mingo@...hat.com>,
        Viktor Rosendahl <Viktor.Rosendahl@....de>
Subject: [RFC PATCH 0/2] Tracing bursts of latencies

Hello all,

This series contains two things:

1. A fix for a bug in the Ftrace latency tracers that appeared with Linux 5.7.

2. The latency-collector, a tool that is designed to work around the
   limitations in the ftrace latency tracers. It needs the bug fix in order to
   work properly.

I have sent a patch series with the latency-collector before.

I never got any comments on it and I stopped pushing it because I thought that
BPF tracing would be the wave of the future and that it would solve the problem
in a cleaner and more elegant way.

Recently, I tried out the criticalstat script from bcc tools but it did not
fulfill all of my hopes and dreams.

On the bright side, it was able to capture all latencies in a burst. The main
problems that I encountered were:

1. The system became unstable and froze now and then. The man page of
   criticalstat has a mention of it being unstable, so I assume that this is a
   known problem.

2. Sometimes the stack traces were incorrect but not in an obvious way. After it
   happened once, all subsequent ones were bad.

3. If two instances were run simultaneously (to capture both preemptoff and irq
   off), there seemed to be a quite large performance hit but I did not measure
   this exactly.

4. The filesystem footprint seemed quite large. The size of libbcc seemed to be
   quite large for a small embedded system.

For these reasons, I take the liberty of resending the latency-collector again.

I would hope to get some comments regarding it, or some suggestion of an
alternative approach of how to solve the problem of being able to capture
latencies that systematically occur close to each other.

Admittedly, it may from a developer's perspective be somewhat of a niche
problem, since removing one latency will reveal the next but when one is doing
validation with a fleet of devices being tested in a long and expensive test
campaign, then it is quite desirable to not lose any latencies.

best regards,

Viktor

Viktor Rosendahl (2):
  Use pause-on-trace with the latency tracers
  Add the latency-collector to tools

 kernel/trace/trace_irqsoff.c      |    4 +
 tools/Makefile                    |   14 +-
 tools/tracing/Makefile            |   20 +
 tools/tracing/latency-collector.c | 1212 +++++++++++++++++++++++++++++
 4 files changed, 1244 insertions(+), 6 deletions(-)
 create mode 100644 tools/tracing/Makefile
 create mode 100644 tools/tracing/latency-collector.c

-- 
2.25.1