linux-kernel - Re: [RFC PATCH v1] perf trace: Mitigate failures in parallel perf trace instances

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAH0uvogdtqrzxamPd2zW9uz2zPMz8r33Aojp2zYTJXn_E1EbfQ@mail.gmail.com>
Date: Thu, 29 May 2025 17:23:25 -0700
From: Howard Chu <howardchu95@...il.com>
To: acme@...nel.org
Cc: mingo@...hat.com, namhyung@...nel.org, mark.rutland@....com, 
	alexander.shishkin@...ux.intel.com, jolsa@...nel.org, irogers@...gle.com, 
	adrian.hunter@...el.com, peterz@...radead.org, kan.liang@...ux.intel.com, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Song Liu <song@...nel.org>
Subject: Re: [RFC PATCH v1] perf trace: Mitigate failures in parallel perf
 trace instances

On Wed, May 28, 2025 at 11:55 PM Howard Chu <howardchu95@...il.com> wrote:
>
> perf trace utilizes the tracepoint utility, the only filter in perf
> trace is a filter on syscall type. For example, if perf traces only
> openat, then it filters all the other syscalls, such as readlinkat,
> readv, etc.
>
> This filtering is flawed. Consider this case: two perf trace
> instances are running at the same time, trace instance A tracing
> readlinkat, trace instance B tracing openat. When an openat syscall
> enters, it triggers both BPF programs (sys_enter) in both perf trace
> instances, these kernel functions will be executed:
>
> perf_syscall_enter
>   perf_call_bpf_enter
>     trace_call_bpf
>       bpf_prog_run_array
>
> In bpf_prog_run_array:
> ~~~
> while ((prog = READ_ONCE(item->prog))) {
>         run_ctx.bpf_cookie = item->bpf_cookie;
>         ret &= run_prog(prog, ctx);
>         item++;
> }
> ~~~
>
> I'm not a BPF expert, but by tinkering I found that if one of the BPF
> programs returns 0, there will be no tracepoint sample. That is,
>
> (Is there a sample?) = ProgRetA | ProgRetB | ProgRetC

Sorry, I meant ProgRetA & ProgRetB & ProgRetC.

>
> Where ProgRetA is the return value of one of the BPF programs in the BPF
> program array.
>
> Go back to the case, when two perf trace instances are tracing two
> different syscalls, again, A is tracing readlinkat, B is tracing openat,
> when an openat syscall enters, it triggers the sys_enter program in
> instance A, call it ProgA, and the sys_enter program in instance B,
> ProgB, now ProgA will return 0 because ProgA cares about readlinkat only,
> even though ProgB returns 1; (Is there a sample?) = ProgRetA (0) |
> ProgRetB (1) = 0. So there won't be a tracepoint sample in B's output,

Same, ProgRetA (0) & ProgRetB (1) = 0.

> when there really should be one.
>
> I also want to point out that openat and readlinkat have augmented
> output, so my example might not be accurate, but it does explain the
> current perf-trace-in-parallel dilemma.
>
> Now for augmented output, it is different. When it calls
> bpf_perf_event_output, there is a sample. There won't be no ProgRetA |
> ProgRetB... thing. So I will send another RFC patch to enable

Ditto.

Thanks,
Howard