lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250116144931.649593-1-tglozar@redhat.com>
Date: Thu, 16 Jan 2025 15:49:26 +0100
From: Tomas Glozar <tglozar@...hat.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-trace-kernel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	John Kacur <jkacur@...hat.com>,
	Luis Goncalves <lgoncalv@...hat.com>,
	Gabriele Monaco <gmonaco@...hat.com>,
	Tomas Glozar <tglozar@...hat.com>
Subject: [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded

We have been seeing an issue where if rtla is run on machines with a high
number of CPUs (100+), timerlat can generate more samples than rtla is able
to process via tracefs_iterate_raw_events. This is especially common when
the interval is set to 100us (rteval and cyclictest default) as opposed
to the rtla default of 1000us, but also happens with the rtla default.

Currently, this leads to rtla hanging and having to be terminated with
SIGTERM. SIGINT setting stop_tracing is not enough, since more and more
events are coming and tracefs_iterate_raw_events never exits.

This patchset contains two changes:
- Stop the timerlat tracer on SIGINT/SIGALRM to ensure no more events are
generated when rtla is supposed exit. This fixes rtla hanging and should
go to stable.
- On receiving SIGINT/SIGALRM twice, abort iteration immediately with
tracefs_iterate_stop, making rtla exit right away instead of waiting for
all events to be processed. This is more of a usability feature: if the user
is in a hurry, they can Ctrl-C twice (or once after the duration has expired)
and exit immediately, discarding any events pending processing.

Note: I am sending those together only because the second one depends on
the first. Also this should be fixed in osnoise, too.

In the future, two more patchsets will be sent: one to display how many
events/samples were dropped (either left in tracefs buffer or by buffer
overflow), one to improve sample processing performance to be on par with
cyclictest (ideally) so that samples are not dropped in the cases mentioned
in the beginning of the email.

Tomas Glozar (5):
  rtla: Add trace_instance_stop
  rtla/timerlat_hist: Stop timerlat tracer on signal
  rtla/timerlat_top: Stop timerlat tracer on signal
  rtla/timerlat_hist: Abort event processing on second signal
  rtla/timerlat_top: Abort event processing on second signal

 tools/tracing/rtla/src/timerlat_hist.c | 19 ++++++++++++++++++-
 tools/tracing/rtla/src/timerlat_top.c  | 20 +++++++++++++++++++-
 tools/tracing/rtla/src/trace.c         |  8 ++++++++
 tools/tracing/rtla/src/trace.h         |  1 +
 4 files changed, 46 insertions(+), 2 deletions(-)

-- 
2.47.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ