linux-kernel - Re: [PATCH bpf-next] bpf: Add preempt_disable to protect get_perf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAEf4BzYL8NSRRZBj0=7aih01LZHAM67cDCAX5FwMW7WcQ_-f0g@mail.gmail.com>
Date: Thu, 25 Sep 2025 16:03:24 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Tao Chen <chen.dylane@...ux.dev>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, Song Liu <song@...nel.org>, 
	Jiri Olsa <jolsa@...nel.org>, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Eduard <eddyz87@...il.com>, 
	Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>, 
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, 
	bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf-next] bpf: Add preempt_disable to protect get_perf_callchain

On Thu, Sep 25, 2025 at 10:45 AM Tao Chen <chen.dylane@...ux.dev> wrote:
>
> 在 2025/9/23 10:53, Alexei Starovoitov 写道:
> > On Mon, Sep 22, 2025 at 12:54 AM Tao Chen <chen.dylane@...ux.dev> wrote:
> >>
> >> As Alexei suggested, the return value from get_perf_callchain() may be
> >> reused if another task preempts and requests the stack after BPF program
> >> switched to migrate disable.
> >>
> >> Reported-by: Alexei Starovoitov <ast@...nel.org>
> >> Signed-off-by: Tao Chen <chen.dylane@...ux.dev>
> >> ---
> >>   kernel/bpf/stackmap.c | 14 +++++---------
> >>   1 file changed, 5 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> >> index 2e182a3ac4c..07892320906 100644
> >> --- a/kernel/bpf/stackmap.c
> >> +++ b/kernel/bpf/stackmap.c
> >> @@ -314,8 +314,10 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> >>          if (max_depth > sysctl_perf_event_max_stack)
> >>                  max_depth = sysctl_perf_event_max_stack;
> >>
> >> +       preempt_disable();
> >>          trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
> >>                                     false, false);
> >> +       preempt_enable();
> >
> > This is obviously wrong.
> > As soon as preemption is enabled, trace can be overwritten.
> > guard(preempt)();
> > can fix it, but the length of the preempt disabled section
> > will be quite big.
> > The way get_perf_callchain() api is written I don't see
> > another option though. Unless we refactor it similar
> > to bpf_try_get_buffers().
> >
> > pw-bot: cr
>
> Hi Alexei,
>
> I tried to understand what you meant and looked at the implementation of
> get_perf_callchain.
>
> Only one perf_callchain_entry on every cpu right now.
>
> callchain_cpus_entries(rcu global avariable)
>      ↓
> struct callchain_cpus_entries {
>         struct perf_callchain_entry     *cpu_entries[];
>                         |
> }                       ｜-> perf_callchain_entry0    cpu0
>                              perf_callchain_entry1     cpu1
>                               …
>                               perf_callchain_entryn     cpun
>
>
> If we want to realise it like bpf_try_get_buffers, we should
> alloc a perf_callchain_entry array on every cpu right?
>
> callchain_cpus_entries(rcu global avariable)
>      ↓
> struct callchain_cpus_entries {
>         struct perf_callchain_entry     *cpu_entries[];
>                         |
> }                       ｜-> perf_callchain_entry0[N]    cpu0
>                              perf_callchain_entry1[N]     cpu1
>                               …
>                               perf_callchain_entryn[N]     cpun

Either allow a few entries per CPU (bpf_try_get_buffers allows up to 3
buffers per CPU), or extend get_perf_callchain() to accept
perf_callchain_entry from outside, and then we can do that in a
BPF-specific way.

>
> --
> Best Regards
> Tao Chen