lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 19 Jan 2021 17:43:42 +0100
From:   Viktor Rosendahl <Viktor.Rosendahl@....de>
To:     Steven Rostedt <rostedt@...dmis.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        <linux-kernel@...r.kernel.org>
CC:     Ingo Molnar <mingo@...hat.com>,
        Viktor Rosendahl <Viktor.Rosendahl@....de>
Subject: [RFC PATCH 0/2] Tracing bursts of latencies

Hello all,

This series contains two things:

1. A fix for a bug in the Ftrace latency tracers that appeared with Linux 5.7.

2. The latency-collector, a tool that is designed to work around the
   limitations in the ftrace latency tracers. It needs the bug fix in order to
   work properly.

I have sent a patch series with the latency-collector before.

I never got any comments on it and I stopped pushing it because I thought that
BPF tracing would be the wave of the future and that it would solve the problem
in a cleaner and more elegant way.

Recently, I tried out the criticalstat script from bcc tools but it did not
fulfill all of my hopes and dreams.

On the bright side, it was able to capture all latencies in a burst. The main
problems that I encountered were:

1. The system became unstable and froze now and then. The man page of
   criticalstat has a mention of it being unstable, so I assume that this is a
   known problem.

2. Sometimes the stack traces were incorrect but not in an obvious way. After it
   happened once, all subsequent ones were bad.
   
3. If two instances were run simultaneously (to capture both preemptoff and irq
   off), there seemed to be a quite large performance hit but I did not measure
   this exactly.
   
4. The filesystem footprint seemed quite large. The size of libbcc seemed to be
   quite large for a small embedded system.

For these reasons, I take the liberty of resending the latency-collector again.

I would hope to get some comments regarding it, or some suggestion of an
alternative approach of how to solve the problem of being able to capture
latencies that systematically occur close to each other.

Admittedly, it may from a developer's perspective be somewhat of a niche
problem, since removing one latency will reveal the next but when one is doing
validation with a fleet of devices being tested in a long and expensive test
campaign, then it is quite desirable to not lose any latencies.

best regards,

Viktor

Viktor Rosendahl (2):
  Use pause-on-trace with the latency tracers
  Add the latency-collector to tools

 kernel/trace/trace_irqsoff.c      |    4 +
 tools/Makefile                    |   14 +-
 tools/tracing/Makefile            |   20 +
 tools/tracing/latency-collector.c | 1212 +++++++++++++++++++++++++++++
 4 files changed, 1244 insertions(+), 6 deletions(-)
 create mode 100644 tools/tracing/Makefile
 create mode 100644 tools/tracing/latency-collector.c

-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ