linux-kernel - Re: [PATCH bpf-next v7 2/2] bpf: Hold the perf callchain entry until used completely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4BzbbxpDmvJrV6s3dupP4Ysx+vVaHYBk+KNZtthag5OvosQ@mail.gmail.com>
Date: Fri, 23 Jan 2026 10:40:15 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Tao Chen <chen.dylane@...ux.dev>
Cc: peterz@...radead.org, mingo@...hat.com, acme@...nel.org, 
	namhyung@...nel.org, mark.rutland@....com, alexander.shishkin@...ux.intel.com, 
	jolsa@...nel.org, irogers@...gle.com, adrian.hunter@...el.com, 
	kan.liang@...ux.intel.com, song@...nel.org, ast@...nel.org, 
	daniel@...earbox.net, andrii@...nel.org, martin.lau@...ux.dev, 
	eddyz87@...il.com, yonghong.song@...ux.dev, john.fastabend@...il.com, 
	kpsingh@...nel.org, sdf@...ichev.me, haoluo@...gle.com, 
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org, 
	bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next v7 2/2] bpf: Hold the perf callchain entry until
 used completely

On Thu, Jan 22, 2026 at 9:42 PM Tao Chen <chen.dylane@...ux.dev> wrote:
>
> 在 2026/1/23 08:38, Andrii Nakryiko 写道:
> > On Wed, Dec 17, 2025 at 1:34 AM Tao Chen <chen.dylane@...ux.dev> wrote:
> >>
> >> As Alexei noted, get_perf_callchain() return values may be reused
> >> if a task is preempted after the BPF program enters migrate disable
> >> mode. The perf_callchain_entres has a small stack of entries, and
> >> we can reuse it as follows:
> >>
> >> 1. get the perf callchain entry
> >> 2. BPF use...
> >> 3. put the perf callchain entry
> >>
> >> And Peter suggested that get_recursion_context used with preemption
> >> disabled, so we should disable preemption at BPF side.
> >>
> >> Acked-by: Yonghong Song <yonghong.song@...ux.dev>
> >> Signed-off-by: Tao Chen <chen.dylane@...ux.dev>
> >> ---
> >>   kernel/bpf/stackmap.c | 68 +++++++++++++++++++++++++++++++++++--------
> >>   1 file changed, 56 insertions(+), 12 deletions(-)
> >>
> >
> > I took a bit closer look at these changes and I'm a fan of the
> > particular implementation, tbh. It's a bit of a maze how all these
> > different call chain cases are handled, so I might be missing
> > something, but I'd address this a bit differently.
> >
> > First, instead of manipulating this obscure rctx as part of interface,
> > I'd record rctx inside the perf_callchain_entry itself, and make sure
> > that get_callchain_entry does have any output arguments.
> > put_callchain_entry() would then accept perf_callchain_entry reference
> > and just fetch rctx from inside it.
> >
>
> Hi Andrri,
>
> Try to implement this briefly with code, is my understanding correct?
>
> struct perf_callchain_entry *get_callchain_entry(void)
> {
>          int cpu;
>          int rctx;
>          struct perf_callchain_entry *entry;
>          struct callchain_cpus_entries *entries;
>
>          rctx = get_recursion_context(this_cpu_ptr(callchain_recursion));
>          if (rctx == -1)
>                  return NULL;
>
>          entries = rcu_dereference(callchain_cpus_entries);
>          if (!entries) {
>
> put_recursion_context(this_cpu_ptr(callchain_recursion), *rctx);
>                  return NULL;
>          }
>
>          cpu = smp_processor_id();
>
>          entry = ((void *)entries->cpu_entries[cpu]) +
>                  (rctx * perf_callchain_entry__sizeof());
>          entry->rctx = rctx;
>
>          return entry;
> }
>
> void
> put_callchain_entry(struct perf_callchain_entry *entry)
> {
>          put_recursion_context(this_cpu_ptr(callchain_recursion),
> entry->rctx);
> }
>
> And then no need rtcx in bpf_get_perf_callchain.

yes, exactly

>
> bpf_get_perf_callchain(struct pt_regs *regs, bool kernel, bool user,
>                         int max_stack, bool crosstask)
>
> Functionally, this seems fine. The only concern is whether the perf
> maintainer will approve it after all, this change involves modifying
> their core interfaces.
>
> Peter, Yonghong, can we do this?
>
> > Then instead of open-coding get_perf_callchain by exposing
> > __init_perf_callchain_ctx, __get_perf_callchain_kernel, and
> > __get_perf_callchain_user, can't we have __get_perf_callchain() which
> > will accept perf_callchain_entry as an input and won't do get/put
> > internally. And then existing get_perf_callchain() will just do get +
> > __get_perf_callchain + put, while BPF-side code will do it's own get
> > (with preemption temporarily disabled), will fetch callstack in one of
> > a few possible ways, and then put it (unless callchain_entry is coming
> > from outside, that trace_in thing).
> >
>
> Exposing only __get_perf_callchain will make it much easier for BPF
> callers to understand and use. And following Peter's suggestion, we can
> mark _init_perf_callchain_ctx, __get_perf_callchain_kernel, and
> __get_perf_callchain_user as static, then have __get_perf_callchain
> encapsulate the logic of _init_perf_callchain_ctx,
> __get_perf_callchain_kernel, and __get_perf_callchain_user. What do you
> think？

That's exactly what I proposed. Except I don't think we even need to
have __get_perf_callchain_user and __get_perf_callchain_kernel, keep
__get_perf_callchain almost identical to current get_perf_callchain(),
except for getting/putting callchain entry

>
> > It's close to what you are doing, but I don't think anyone likes those
> > exposed __init_perf_callchain_ctx + __get_perf_callchain_kernel +
> > __get_perf_callchain_user. Can't we avoid that? (and also not sure we
> > need add_mark inside the ctx itself, do we?)
> >
> > pw-bot: cr
> >
> >
> >

[please trim irrelevant parts]