[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEf4BzYL8NSRRZBj0=7aih01LZHAM67cDCAX5FwMW7WcQ_-f0g@mail.gmail.com>
Date: Thu, 25 Sep 2025 16:03:24 -0700
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Tao Chen <chen.dylane@...ux.dev>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, Song Liu <song@...nel.org>,
Jiri Olsa <jolsa@...nel.org>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eduard <eddyz87@...il.com>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf-next] bpf: Add preempt_disable to protect get_perf_callchain
On Thu, Sep 25, 2025 at 10:45 AM Tao Chen <chen.dylane@...ux.dev> wrote:
>
> 在 2025/9/23 10:53, Alexei Starovoitov 写道:
> > On Mon, Sep 22, 2025 at 12:54 AM Tao Chen <chen.dylane@...ux.dev> wrote:
> >>
> >> As Alexei suggested, the return value from get_perf_callchain() may be
> >> reused if another task preempts and requests the stack after BPF program
> >> switched to migrate disable.
> >>
> >> Reported-by: Alexei Starovoitov <ast@...nel.org>
> >> Signed-off-by: Tao Chen <chen.dylane@...ux.dev>
> >> ---
> >> kernel/bpf/stackmap.c | 14 +++++---------
> >> 1 file changed, 5 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> >> index 2e182a3ac4c..07892320906 100644
> >> --- a/kernel/bpf/stackmap.c
> >> +++ b/kernel/bpf/stackmap.c
> >> @@ -314,8 +314,10 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
> >> if (max_depth > sysctl_perf_event_max_stack)
> >> max_depth = sysctl_perf_event_max_stack;
> >>
> >> + preempt_disable();
> >> trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
> >> false, false);
> >> + preempt_enable();
> >
> > This is obviously wrong.
> > As soon as preemption is enabled, trace can be overwritten.
> > guard(preempt)();
> > can fix it, but the length of the preempt disabled section
> > will be quite big.
> > The way get_perf_callchain() api is written I don't see
> > another option though. Unless we refactor it similar
> > to bpf_try_get_buffers().
> >
> > pw-bot: cr
>
> Hi Alexei,
>
> I tried to understand what you meant and looked at the implementation of
> get_perf_callchain.
>
> Only one perf_callchain_entry on every cpu right now.
>
> callchain_cpus_entries(rcu global avariable)
> ↓
> struct callchain_cpus_entries {
> struct perf_callchain_entry *cpu_entries[];
> |
> } |-> perf_callchain_entry0 cpu0
> perf_callchain_entry1 cpu1
> …
> perf_callchain_entryn cpun
>
>
> If we want to realise it like bpf_try_get_buffers, we should
> alloc a perf_callchain_entry array on every cpu right?
>
> callchain_cpus_entries(rcu global avariable)
> ↓
> struct callchain_cpus_entries {
> struct perf_callchain_entry *cpu_entries[];
> |
> } |-> perf_callchain_entry0[N] cpu0
> perf_callchain_entry1[N] cpu1
> …
> perf_callchain_entryn[N] cpun
Either allow a few entries per CPU (bpf_try_get_buffers allows up to 3
buffers per CPU), or extend get_perf_callchain() to accept
perf_callchain_entry from outside, and then we can do that in a
BPF-specific way.
>
> --
> Best Regards
> Tao Chen
Powered by blists - more mailing lists