[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH0uvogdtqrzxamPd2zW9uz2zPMz8r33Aojp2zYTJXn_E1EbfQ@mail.gmail.com>
Date: Thu, 29 May 2025 17:23:25 -0700
From: Howard Chu <howardchu95@...il.com>
To: acme@...nel.org
Cc: mingo@...hat.com, namhyung@...nel.org, mark.rutland@....com,
alexander.shishkin@...ux.intel.com, jolsa@...nel.org, irogers@...gle.com,
adrian.hunter@...el.com, peterz@...radead.org, kan.liang@...ux.intel.com,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
Song Liu <song@...nel.org>
Subject: Re: [RFC PATCH v1] perf trace: Mitigate failures in parallel perf
trace instances
On Wed, May 28, 2025 at 11:55 PM Howard Chu <howardchu95@...il.com> wrote:
>
> perf trace utilizes the tracepoint utility, the only filter in perf
> trace is a filter on syscall type. For example, if perf traces only
> openat, then it filters all the other syscalls, such as readlinkat,
> readv, etc.
>
> This filtering is flawed. Consider this case: two perf trace
> instances are running at the same time, trace instance A tracing
> readlinkat, trace instance B tracing openat. When an openat syscall
> enters, it triggers both BPF programs (sys_enter) in both perf trace
> instances, these kernel functions will be executed:
>
> perf_syscall_enter
> perf_call_bpf_enter
> trace_call_bpf
> bpf_prog_run_array
>
> In bpf_prog_run_array:
> ~~~
> while ((prog = READ_ONCE(item->prog))) {
> run_ctx.bpf_cookie = item->bpf_cookie;
> ret &= run_prog(prog, ctx);
> item++;
> }
> ~~~
>
> I'm not a BPF expert, but by tinkering I found that if one of the BPF
> programs returns 0, there will be no tracepoint sample. That is,
>
> (Is there a sample?) = ProgRetA | ProgRetB | ProgRetC
Sorry, I meant ProgRetA & ProgRetB & ProgRetC.
>
> Where ProgRetA is the return value of one of the BPF programs in the BPF
> program array.
>
> Go back to the case, when two perf trace instances are tracing two
> different syscalls, again, A is tracing readlinkat, B is tracing openat,
> when an openat syscall enters, it triggers the sys_enter program in
> instance A, call it ProgA, and the sys_enter program in instance B,
> ProgB, now ProgA will return 0 because ProgA cares about readlinkat only,
> even though ProgB returns 1; (Is there a sample?) = ProgRetA (0) |
> ProgRetB (1) = 0. So there won't be a tracepoint sample in B's output,
Same, ProgRetA (0) & ProgRetB (1) = 0.
> when there really should be one.
>
> I also want to point out that openat and readlinkat have augmented
> output, so my example might not be accurate, but it does explain the
> current perf-trace-in-parallel dilemma.
>
> Now for augmented output, it is different. When it calls
> bpf_perf_event_output, there is a sample. There won't be no ProgRetA |
> ProgRetB... thing. So I will send another RFC patch to enable
Ditto.
Thanks,
Howard
Powered by blists - more mailing lists