linux-kernel - Re: [PATCH bpf-next v2 1/2] bpf: Add preempt disable for bpf_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEf4BzbSCbqg6y1KGg_j5xkK1=xsmOyK5ob9uTJiVcWgQ4jAJw@mail.gmail.com>
Date: Fri, 6 Feb 2026 09:12:13 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Tao Chen <chen.dylane@...ux.dev>
Cc: song@...nel.org, jolsa@...nel.org, ast@...nel.org, daniel@...earbox.net, 
	andrii@...nel.org, martin.lau@...ux.dev, eddyz87@...il.com, 
	yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org, 
	sdf@...ichev.me, haoluo@...gle.com, bpf@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Add preempt disable for bpf_get_stack

On Fri, Feb 6, 2026 at 1:07 AM Tao Chen <chen.dylane@...ux.dev> wrote:
>
> The get_perf_callchain() return values may be reused if a task is preempted
> after the BPF program enters migrate disable mode, so we should add
> preempt_disable. And as Andrii suggested, BPF can guarantee perf callchain
> buffer won't be released during use, for bpf_get_stack_id, BPF stack map
> will keep them alive by delaying put_callchain_buffer() until freeing time
> or for bpf_get_stack/bpf_get_task_stack, BPF program itself will hold these
> buffers alive again, until freeing time which is delayed until after
> RCU Tasks Trace + RCU grace period.
>
> Suggested-by: Andrii Nakryiko <andrii@...nel.org>
> Signed-off-by: Tao Chen <chen.dylane@...ux.dev>
> ---
>
> Change list:
>  - v1 -> v2
>    - add preempt_disable for bpf_get_stack in patch1
>    - add patch2
>  - v1: https://lore.kernel.org/bpf/20260128165710.928294-1-chen.dylane@linux.dev
>
>  kernel/bpf/stackmap.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>

Hm... looking at bpf_get_stack_pe(), I'm not sure what's the exact
guarantees around that ctx->data->callchain that we pass as
trace_in... It looks like it's the same temporary per-cpu callchain as
in other places, just attached (temporarily) to ctx. So we probably
want preemption disabled/enabled for that one as well, no? And to
achieve that, I think we'll need to split out build_id logic out of
__bpf_get_stack() and do it after preemption is enabled in the
callers. Luckily it's not that much of a code and logic, should be
easy. But please analyze this carefully yourself.

pw-bot: cr


> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index da3d328f5c1..1b100a03ef2 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -460,8 +460,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
>
>         max_depth = stack_map_calculate_max_depth(size, elem_size, flags);
>
> -       if (may_fault)
> -               rcu_read_lock(); /* need RCU for perf's callchain below */
> +       if (!trace_in)
> +               preempt_disable();
>
>         if (trace_in) {
>                 trace = trace_in;
> @@ -474,8 +474,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
>         }
>
>         if (unlikely(!trace) || trace->nr < skip) {
> -               if (may_fault)
> -                       rcu_read_unlock();
> +               if (!trace_in)
> +                       preempt_enable();
>                 goto err_fault;
>         }
>
> @@ -493,9 +493,8 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
>                 memcpy(buf, ips, copy_len);
>         }
>
> -       /* trace/ips should not be dereferenced after this point */
> -       if (may_fault)
> -               rcu_read_unlock();
> +       if (!trace_in)
> +               preempt_enable();
>
>         if (user_build_id)
>                 stack_map_get_build_id_offset(buf, trace_nr, user, may_fault);
> --
> 2.48.1
>