[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250116144931.649593-1-tglozar@redhat.com>
Date: Thu, 16 Jan 2025 15:49:26 +0100
From: Tomas Glozar <tglozar@...hat.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-trace-kernel@...r.kernel.org,
linux-kernel@...r.kernel.org,
John Kacur <jkacur@...hat.com>,
Luis Goncalves <lgoncalv@...hat.com>,
Gabriele Monaco <gmonaco@...hat.com>,
Tomas Glozar <tglozar@...hat.com>
Subject: [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded
We have been seeing an issue where if rtla is run on machines with a high
number of CPUs (100+), timerlat can generate more samples than rtla is able
to process via tracefs_iterate_raw_events. This is especially common when
the interval is set to 100us (rteval and cyclictest default) as opposed
to the rtla default of 1000us, but also happens with the rtla default.
Currently, this leads to rtla hanging and having to be terminated with
SIGTERM. SIGINT setting stop_tracing is not enough, since more and more
events are coming and tracefs_iterate_raw_events never exits.
This patchset contains two changes:
- Stop the timerlat tracer on SIGINT/SIGALRM to ensure no more events are
generated when rtla is supposed exit. This fixes rtla hanging and should
go to stable.
- On receiving SIGINT/SIGALRM twice, abort iteration immediately with
tracefs_iterate_stop, making rtla exit right away instead of waiting for
all events to be processed. This is more of a usability feature: if the user
is in a hurry, they can Ctrl-C twice (or once after the duration has expired)
and exit immediately, discarding any events pending processing.
Note: I am sending those together only because the second one depends on
the first. Also this should be fixed in osnoise, too.
In the future, two more patchsets will be sent: one to display how many
events/samples were dropped (either left in tracefs buffer or by buffer
overflow), one to improve sample processing performance to be on par with
cyclictest (ideally) so that samples are not dropped in the cases mentioned
in the beginning of the email.
Tomas Glozar (5):
rtla: Add trace_instance_stop
rtla/timerlat_hist: Stop timerlat tracer on signal
rtla/timerlat_top: Stop timerlat tracer on signal
rtla/timerlat_hist: Abort event processing on second signal
rtla/timerlat_top: Abort event processing on second signal
tools/tracing/rtla/src/timerlat_hist.c | 19 ++++++++++++++++++-
tools/tracing/rtla/src/timerlat_top.c | 20 +++++++++++++++++++-
tools/tracing/rtla/src/trace.c | 8 ++++++++
tools/tracing/rtla/src/trace.h | 1 +
4 files changed, 46 insertions(+), 2 deletions(-)
--
2.47.1
Powered by blists - more mailing lists