[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240715150410.GJ14400@noisy.programming.kicks-ass.net>
Date: Mon, 15 Jul 2024 17:04:10 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Kyle Huey <me@...ehuey.com>
Cc: Jiri Olsa <olsajiri@...il.com>, khuey@...ehuey.com,
Ingo Molnar <mingo@...hat.com>, Namhyung Kim <namhyung@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
robert@...llahan.org, Joe Damato <jdamato@...tly.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
"Liang, Kan" <kan.liang@...ux.intel.com>,
Andrii Nakryiko <andrii@...nel.org>, Song Liu <song@...nel.org>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: [PATCH] perf/bpf: Don't call bpf_overflow_handler() for tracing
events
On Mon, Jul 15, 2024 at 07:33:57AM -0700, Kyle Huey wrote:
> On Mon, Jul 15, 2024 at 4:12 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > Urgh, so wth does event_is_tracing do with event->prog? And can't we
> > clean this up?
>
> Tracing events keep track of the bpf program in event->prog solely for
> cleanup. The bpf programs are stored in and invoked from
> event->tp_event->prog_array, but when the event is destroyed it needs
> to know which bpf program to remove from that array.
Yeah, figured it out eventually.. Does look like it needs event->prog
and we can't easily remedy this dual use :/
> > That whole perf_event_is_tracing() is a pretty gross function.
> >
> > Also, I think the default return value of bpf_overflow_handler() is
> > wrong -- note how if !event->prog we won't call bpf_overflow_handler(),
> > but if we do call it, but then have !event->prog on the re-read, we
> > still return 0.
>
> The synchronization model here isn't quite clear to me but I don't
> think this matters in practice. Once event->prog is set the only
> allowed change is for it to be cleared when the perf event is freed.
> Anything else is refused by perf_event_set_bpf_handler() with EEXIST.
> Can that free race with an overflow handler? I'm not sure, but even if
> it can, dropping an overflow for an event that's being freed seems
> fine to me. If it can't race then we could remove the condition on the
> re-read entirely.
Right, also rcu_read_lock() is cheap enough to unconditionally do I'm
thinking.
So since we have two distinct users of event->prog, I figured we could
distinguish them from one of the LSB in the pointer value, which then
got me the below.
But now that I see the end result I'm not at all sure this is sane.
But I figure it ought to work...
---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ab6c4c942f79..5ec78346c2a1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9594,6 +9594,13 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
}
#ifdef CONFIG_BPF_SYSCALL
+
+static inline struct bpf_prog *event_prog(struct perf_event *event)
+{
+ unsigned long _prog = (unsigned long)READ_ONCE(event->prog);
+ return (void *)(_prog & ~1);
+}
+
static int bpf_overflow_handler(struct perf_event *event,
struct perf_sample_data *data,
struct pt_regs *regs)
@@ -9603,19 +9610,21 @@ static int bpf_overflow_handler(struct perf_event *event,
.event = event,
};
struct bpf_prog *prog;
- int ret = 0;
+ int ret = 1;
+
+ guard(rcu)();
- ctx.regs = perf_arch_bpf_user_pt_regs(regs);
- if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
- goto out;
- rcu_read_lock();
prog = READ_ONCE(event->prog);
- if (prog) {
+ if (!((unsigned long)prog & 1))
+ return ret;
+
+ prog = (void *)((unsigned long)prog & ~1);
+
+ if (unlikely(__this_cpu_inc_return(bpf_prog_active) == 1)) {
perf_prepare_sample(data, event, regs);
+ ctx.regs = perf_arch_bpf_user_pt_regs(regs);
ret = bpf_prog_run(prog, &ctx);
}
- rcu_read_unlock();
-out:
__this_cpu_dec(bpf_prog_active);
return ret;
@@ -9652,14 +9661,14 @@ static inline int perf_event_set_bpf_handler(struct perf_event *event,
return -EPROTO;
}
- event->prog = prog;
+ event->prog = (void *)((unsigned long)prog | 1);
event->bpf_cookie = bpf_cookie;
return 0;
}
static inline void perf_event_free_bpf_handler(struct perf_event *event)
{
- struct bpf_prog *prog = event->prog;
+ struct bpf_prog *prog = event_prog(event);
if (!prog)
return;
@@ -9707,7 +9716,7 @@ static int __perf_event_overflow(struct perf_event *event,
ret = __perf_event_account_interrupt(event, throttle);
- if (event->prog && !bpf_overflow_handler(event, data, regs))
+ if (!bpf_overflow_handler(event, data, regs))
return ret;
/*
@@ -12026,10 +12035,10 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
context = parent_event->overflow_handler_context;
#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
if (parent_event->prog) {
- struct bpf_prog *prog = parent_event->prog;
+ struct bpf_prog *prog = event_prog(parent_event);
bpf_prog_inc(prog);
- event->prog = prog;
+ event->prog = parent_event->prog;
}
#endif
}
Powered by blists - more mailing lists