linux-kernel - Re: [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAP4=nvQpS20ubNspE0PPhiyWb3-ARV=gmQzFCA7WwAT8+rxMjg@mail.gmail.com>
Date: Fri, 17 Jan 2025 13:04:07 +0100
From: Tomas Glozar <tglozar@...hat.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	John Kacur <jkacur@...hat.com>, Luis Goncalves <lgoncalv@...hat.com>, 
	Gabriele Monaco <gmonaco@...hat.com>
Subject: Re: [PATCH 0/5] rtla/timerlat: Stop on signal properly when overloaded

pá 17. 1. 2025 v 1:46 odesílatel Steven Rostedt <rostedt@...dmis.org> napsal:
> Hmm, I wonder if timerlat can handle per cpu data, then you could kick off
> a thread per CPU (or a set of CPUs) where the thread is responsible for
> handling the data.
>
>
>                 CPU_ZERO_S(cpu_size, cpusetp);
>                 CPU_SET_S(cpu, cpu_size, cpusetp);
>                 retval = tracefs_iterate_raw_events(trace->tep,
>                                 trace->inst,
>                                 cpusetp,
>                                 cpu_size,
>                                 collect_registered_events,
>                                                     trace);
>
> And then that iteration will only read over a subset of CPUs. Each thread
> can do a different subset and then it should be able to keep up.
>

That's a good idea, I didn't think of that. But it doesn't help much
in a scenario where rtla is pinned to a few housekeeping CPUs with -H,
which is used for testing isolated-CPU-based setups.

I was thinking of turning timerlat_hist_handler/timerlat_top_handler
into a BPF program and having it executed right after the sample is
created, e.g. by using the BPF perf interface to hook it to a
tracepoint event. The histogram/counter would be stored in BPF maps,
which would be merely copied over in the main loop. This is
essentially how cyclictest does it, except in userspace. I expect this
solution to have good performance, but the obvious downside is that it
requires BPF. This is not a problem for us, but might be for other
rtla users and we'd likely have to keep both implementations of sample
processing in the code.

Also, before even starting with that, it would be likely necessary to
remove the duplicate code throughout timerlat/osnoise and test it
properly, so we don't have to do the same code changes twice or four
times.

Tomas