[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzbSCbqg6y1KGg_j5xkK1=xsmOyK5ob9uTJiVcWgQ4jAJw@mail.gmail.com>
Date: Fri, 6 Feb 2026 09:12:13 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Tao Chen <chen.dylane@...ux.dev>
Cc: song@...nel.org, jolsa@...nel.org, ast@...nel.org, daniel@...earbox.net,
andrii@...nel.org, martin.lau@...ux.dev, eddyz87@...il.com,
yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org,
sdf@...ichev.me, haoluo@...gle.com, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Add preempt disable for bpf_get_stack
On Fri, Feb 6, 2026 at 1:07 AM Tao Chen <chen.dylane@...ux.dev> wrote:
>
> The get_perf_callchain() return values may be reused if a task is preempted
> after the BPF program enters migrate disable mode, so we should add
> preempt_disable. And as Andrii suggested, BPF can guarantee perf callchain
> buffer won't be released during use, for bpf_get_stack_id, BPF stack map
> will keep them alive by delaying put_callchain_buffer() until freeing time
> or for bpf_get_stack/bpf_get_task_stack, BPF program itself will hold these
> buffers alive again, until freeing time which is delayed until after
> RCU Tasks Trace + RCU grace period.
>
> Suggested-by: Andrii Nakryiko <andrii@...nel.org>
> Signed-off-by: Tao Chen <chen.dylane@...ux.dev>
> ---
>
> Change list:
> - v1 -> v2
> - add preempt_disable for bpf_get_stack in patch1
> - add patch2
> - v1: https://lore.kernel.org/bpf/20260128165710.928294-1-chen.dylane@linux.dev
>
> kernel/bpf/stackmap.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
Hm... looking at bpf_get_stack_pe(), I'm not sure what's the exact
guarantees around that ctx->data->callchain that we pass as
trace_in... It looks like it's the same temporary per-cpu callchain as
in other places, just attached (temporarily) to ctx. So we probably
want preemption disabled/enabled for that one as well, no? And to
achieve that, I think we'll need to split out build_id logic out of
__bpf_get_stack() and do it after preemption is enabled in the
callers. Luckily it's not that much of a code and logic, should be
easy. But please analyze this carefully yourself.
pw-bot: cr
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index da3d328f5c1..1b100a03ef2 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -460,8 +460,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
>
> max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
>
> - if (may_fault)
> - rcu_read_lock(); /* need RCU for perf's callchain below */
> + if (!trace_in)
> + preempt_disable();
>
> if (trace_in) {
> trace = trace_in;
> @@ -474,8 +474,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> }
>
> if (unlikely(!trace) || trace->nr < skip) {
> - if (may_fault)
> - rcu_read_unlock();
> + if (!trace_in)
> + preempt_enable();
> goto err_fault;
> }
>
> @@ -493,9 +493,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
> memcpy(buf, ips, copy_len);
> }
>
> - /* trace/ips should not be dereferenced after this point */
> - if (may_fault)
> - rcu_read_unlock();
> + if (!trace_in)
> + preempt_enable();
>
> if (user_build_id)
> stack_map_get_build_id_offset(buf, trace_nr, user, may_fault);
> --
> 2.48.1
>
Powered by blists - more mailing lists