linux-kernel - Re: [RFC PATCH v1] perf trace: Mitigate failures in parallel perf trace instances

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aEAfHYLEyc7xGy7E@krava>
Date: Wed, 4 Jun 2025 12:25:33 +0200
From: Jiri Olsa <olsajiri@...il.com>
To: Howard Chu <howardchu95@...il.com>, Namhyung Kim <namhyung@...nel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, acme@...nel.org,
	mingo@...hat.com, mark.rutland@....com,
	alexander.shishkin@...ux.intel.com, irogers@...gle.com,
	adrian.hunter@...el.com, peterz@...radead.org,
	kan.liang@...ux.intel.com, linux-perf-users@...r.kernel.org,
	linux-kernel@...r.kernel.org, Song Liu <song@...nel.org>,
	bpf@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	Andrii Nakryiko <andrii@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC PATCH v1] perf trace: Mitigate failures in parallel perf
 trace instances

On Mon, Jun 02, 2025 at 06:17:43PM -0400, Steven Rostedt wrote:
> On Fri, 30 May 2025 17:00:38 -0700
> Howard Chu <howardchu95@...il.com> wrote:
> 
> > Hello Namhyung,
> > 
> > On Fri, May 30, 2025 at 4:37 PM Namhyung Kim <namhyung@...nel.org> wrote:
> > >
> > > Hello,
> > >
> > > (Adding tracing folks)  
> > 
> > (That's so convenient wow)
> 
> Shouldn't the BPF folks be more relevant. I don't see any of the tracing
> code involved here.
> 
> > 
> > >
> > > On Wed, May 28, 2025 at 11:55:36PM -0700, Howard Chu wrote:  
> > > > perf trace utilizes the tracepoint utility, the only filter in perf
> > > > trace is a filter on syscall type. For example, if perf traces only
> > > > openat, then it filters all the other syscalls, such as readlinkat,
> > > > readv, etc.
> > > >
> > > > This filtering is flawed. Consider this case: two perf trace
> > > > instances are running at the same time, trace instance A tracing
> > > > readlinkat, trace instance B tracing openat. When an openat syscall
> > > > enters, it triggers both BPF programs (sys_enter) in both perf trace
> > > > instances, these kernel functions will be executed:
> > > >
> > > > perf_syscall_enter
> > > >   perf_call_bpf_enter
> > > >     trace_call_bpf
> 
> This is in bpf_trace.c (BPF related, not tracing related).
> 
> -- Steve
> 
> 
> > > >       bpf_prog_run_array
> > > >
> > > > In bpf_prog_run_array:
> > > > ~~~
> > > > while ((prog = READ_ONCE(item->prog))) {
> > > >       run_ctx.bpf_cookie = item->bpf_cookie;
> > > >       ret &= run_prog(prog, ctx);
> > > >       item++;
> > > > }
> > > > ~~~
> > > >
> > > > I'm not a BPF expert, but by tinkering I found that if one of the BPF
> > > > programs returns 0, there will be no tracepoint sample. That is,
> > > >
> > > > (Is there a sample?) = ProgRetA & ProgRetB & ProgRetC
> > > >
> > > > Where ProgRetA is the return value of one of the BPF programs in the BPF
> > > > program array.
> > > >
> > > > Go back to the case, when two perf trace instances are tracing two
> > > > different syscalls, again, A is tracing readlinkat, B is tracing openat,
> > > > when an openat syscall enters, it triggers the sys_enter program in
> > > > instance A, call it ProgA, and the sys_enter program in instance B,
> > > > ProgB, now ProgA will return 0 because ProgA cares about readlinkat only,
> > > > even though ProgB returns 1; (Is there a sample?) = ProgRetA (0) &
> > > > ProgRetB (1) = 0. So there won't be a tracepoint sample in B's output,
> > > > when there really should be one.  
> > >
> > > Sounds like a bug.  I think it should run bpf programs attached to the
> > > current perf_event only.  Isn't it the case for tracepoint + perf + bpf?  
> > 
> > I really can't answer that question.

bpf programs for tracepoint are executed before the perf event specific
check/trigger in perf_trace_run_bpf_submit

bpf programs array is part of struct trace_event_call so it's global per
tracepoint, not per perf event

IIRC perf trace needs the perf event sample and the bpf program is there
to do the filter and some other related stuff?

if that's the case I wonder we could switch bpf_prog_run_array logic
to be permissive like below, and perhaps make that as tracepoint specific
change, because bpf_prog_run_array is used in other place

or make it somehow configurable.. otherwise I fear that might break existing
use cases.. FWIW bpf ci tests [1] survived the change below

jirka


[1] https://github.com/kernel-patches/bpf/pull/9044

---
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b25d278409b..4ca8afe0217c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2218,12 +2218,12 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
 	const struct bpf_prog *prog;
 	struct bpf_run_ctx *old_run_ctx;
 	struct bpf_trace_run_ctx run_ctx;
-	u32 ret = 1;
+	u32 ret = 0;
 
 	RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "no rcu lock held");
 
 	if (unlikely(!array))
-		return ret;
+		return 1;
 
 	run_ctx.is_uprobe = false;
 
@@ -2232,7 +2232,7 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
 	item = &array->items[0];
 	while ((prog = READ_ONCE(item->prog))) {
 		run_ctx.bpf_cookie = item->bpf_cookie;
-		ret &= run_prog(prog, ctx);
+		ret |= run_prog(prog, ctx);
 		item++;
 	}
 	bpf_reset_run_ctx(old_run_ctx);