[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4BzbbxpDmvJrV6s3dupP4Ysx+vVaHYBk+KNZtthag5OvosQ@mail.gmail.com>
Date: Fri, 23 Jan 2026 10:40:15 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Tao Chen <chen.dylane@...ux.dev>
Cc: peterz@...radead.org, mingo@...hat.com, acme@...nel.org,
namhyung@...nel.org, mark.rutland@....com, alexander.shishkin@...ux.intel.com,
jolsa@...nel.org, irogers@...gle.com, adrian.hunter@...el.com,
kan.liang@...ux.intel.com, song@...nel.org, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, martin.lau@...ux.dev,
eddyz87@...il.com, yonghong.song@...ux.dev, john.fastabend@...il.com,
kpsingh@...nel.org, sdf@...ichev.me, haoluo@...gle.com,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next v7 2/2] bpf: Hold the perf callchain entry until
used completely
On Thu, Jan 22, 2026 at 9:42 PM Tao Chen <chen.dylane@...ux.dev> wrote:
>
> 在 2026/1/23 08:38, Andrii Nakryiko 写道:
> > On Wed, Dec 17, 2025 at 1:34 AM Tao Chen <chen.dylane@...ux.dev> wrote:
> >>
> >> As Alexei noted, get_perf_callchain() return values may be reused
> >> if a task is preempted after the BPF program enters migrate disable
> >> mode. The perf_callchain_entres has a small stack of entries, and
> >> we can reuse it as follows:
> >>
> >> 1. get the perf callchain entry
> >> 2. BPF use...
> >> 3. put the perf callchain entry
> >>
> >> And Peter suggested that get_recursion_context used with preemption
> >> disabled, so we should disable preemption at BPF side.
> >>
> >> Acked-by: Yonghong Song <yonghong.song@...ux.dev>
> >> Signed-off-by: Tao Chen <chen.dylane@...ux.dev>
> >> ---
> >> kernel/bpf/stackmap.c | 68 +++++++++++++++++++++++++++++++++++--------
> >> 1 file changed, 56 insertions(+), 12 deletions(-)
> >>
> >
> > I took a bit closer look at these changes and I'm a fan of the
> > particular implementation, tbh. It's a bit of a maze how all these
> > different call chain cases are handled, so I might be missing
> > something, but I'd address this a bit differently.
> >
> > First, instead of manipulating this obscure rctx as part of interface,
> > I'd record rctx inside the perf_callchain_entry itself, and make sure
> > that get_callchain_entry does have any output arguments.
> > put_callchain_entry() would then accept perf_callchain_entry reference
> > and just fetch rctx from inside it.
> >
>
> Hi Andrri,
>
> Try to implement this briefly with code, is my understanding correct?
>
> struct perf_callchain_entry *get_callchain_entry(void)
> {
> int cpu;
> int rctx;
> struct perf_callchain_entry *entry;
> struct callchain_cpus_entries *entries;
>
> rctx = get_recursion_context(this_cpu_ptr(callchain_recursion));
> if (rctx == -1)
> return NULL;
>
> entries = rcu_dereference(callchain_cpus_entries);
> if (!entries) {
>
> put_recursion_context(this_cpu_ptr(callchain_recursion), *rctx);
> return NULL;
> }
>
> cpu = smp_processor_id();
>
> entry = ((void *)entries->cpu_entries[cpu]) +
> (rctx * perf_callchain_entry__sizeof());
> entry->rctx = rctx;
>
> return entry;
> }
>
> void
> put_callchain_entry(struct perf_callchain_entry *entry)
> {
> put_recursion_context(this_cpu_ptr(callchain_recursion),
> entry->rctx);
> }
>
> And then no need rtcx in bpf_get_perf_callchain.
yes, exactly
>
> bpf_get_perf_callchain(struct pt_regs *regs, bool kernel, bool user,
> int max_stack, bool crosstask)
>
> Functionally, this seems fine. The only concern is whether the perf
> maintainer will approve it after all, this change involves modifying
> their core interfaces.
>
> Peter, Yonghong, can we do this?
>
> > Then instead of open-coding get_perf_callchain by exposing
> > __init_perf_callchain_ctx, __get_perf_callchain_kernel, and
> > __get_perf_callchain_user, can't we have __get_perf_callchain() which
> > will accept perf_callchain_entry as an input and won't do get/put
> > internally. And then existing get_perf_callchain() will just do get +
> > __get_perf_callchain + put, while BPF-side code will do it's own get
> > (with preemption temporarily disabled), will fetch callstack in one of
> > a few possible ways, and then put it (unless callchain_entry is coming
> > from outside, that trace_in thing).
> >
>
> Exposing only __get_perf_callchain will make it much easier for BPF
> callers to understand and use. And following Peter's suggestion, we can
> mark _init_perf_callchain_ctx, __get_perf_callchain_kernel, and
> __get_perf_callchain_user as static, then have __get_perf_callchain
> encapsulate the logic of _init_perf_callchain_ctx,
> __get_perf_callchain_kernel, and __get_perf_callchain_user. What do you
> think?
That's exactly what I proposed. Except I don't think we even need to
have __get_perf_callchain_user and __get_perf_callchain_kernel, keep
__get_perf_callchain almost identical to current get_perf_callchain(),
except for getting/putting callchain entry
>
> > It's close to what you are doing, but I don't think anyone likes those
> > exposed __init_perf_callchain_ctx + __get_perf_callchain_kernel +
> > __get_perf_callchain_user. Can't we avoid that? (and also not sure we
> > need add_mark inside the ctx itself, do we?)
> >
> > pw-bot: cr
> >
> >
> >
[please trim irrelevant parts]
Powered by blists - more mailing lists